Thursday, 18 April 2019

Secure programming with GCC and Undefined behaviour


Buffer overflow, heap overflow, double free, null pointer dereferencing, format string, etc are the typical vulnerabilities that are being exploited. Compilers strive hard to extract every bit of performance and assume that undefined behaviours are not part of the program. This is very important for many classes of application. However, this also means that compilers can inadvertently introduce security flows if the program has undefined behaviour.


For example:

char buffer[BUFLEN];
char *buffer_end = buffer + BUFLEN;
if (buffer + len >= buffer_end || buffer + len < buffer) {
    return  ERROR;
}
The code above is trying to check for out of bound memory access however according to C standard, pointer addition will not yield a value outside the same object. Thus compiler can optimise away the second condition making the code vulnerable for buffer overflow. The issue with security vulnerability introduced by undefined behaviour is that it shows up suddenly with a new version of the compiler or with a new compiler optimisation flag without any code change. This used to be a common idiom in C/C++ to check for overflow and hence when compiler optimisations were improved, it broke many applications [1]. CERT [2] provides the list of problems and the way to rewrite them to avoid compilers from optimising unintended way. GCC also provides multiple tools that can spot potential issues and warn users. In some cases, it also allows the user to disable optimisations. Lets now look at some examples from GCC Bugzilla related to undefined behaviour and how we can detect/handle them.


Some interesting Issues

  • Infinite loop generated on the non-infinite code - PR53073

int d[16];
int SATD (void)
{
  int satd = 0, dd, k;
  for (dd = d[k = 0]; k<16; dd = d[++k])
  {
    satd += (dd < 0 ? -dd : dd);
  }
  return satd;
}

The C standard says that it is legal for a pointer to point to one element past the end but accessing that location is undefined. The compiler can, therefore, assume that the “k” can never be 16 at the point of k < 16. More from the GCC bugzilla link above.

  • Signed integer overflow - PR30475

According to C and C++ language standards overflow of a signed value is undefined behaviour and correct (or standard conforming) C/C++ program must never generate signed overflow -fno-strict-overflow /-fwrapv disables it. Unfortunately, Some of these overflow checks used to be popular as common idioms to prevent buffer overflows in many code bases.

  • Divide by zero is undefined 

#include <stdio.h>
int testdiv (int i, int k) {
    if (k == 0) printf ("found divide by zero\n");
    return (i / k);
}
int main() {
    int i = testdiv (1, 0);
    return (i);
}

This is based on error found for PostgreSQL 8.1.5 on Solaris 9 sparc with gcc-4.1 Since k is divisor, compiler assumed “k” cannot be zero print statement is optimised away 

  • Dereferencing a NULL pointer  is undefined - PR29968

static unsigned int tun_chr_poll (struct file *file, poll_table * wait)
    {
    struct tun_file *tfile = file->private_data;
    struct tun_struct *tun = __tun_get(tfile);
    struct sock *sk = tun->sk;
    unsigned int mask = 0;

    if (!tun)
        return POLLERR;
   /* …. */
   }
Example from Linux Kernel (https://lwn.net/Articles/342330/) . Since tun is deferenced compiler can optimize away the check.

  • Calling a NULL Object is undefined - PR68853

gcc-6 exposes undefined behavior in Chromium v8 garbage collector. I.e., calling a NULL object is undefined -fno-delete-null-pointer-checks allows this to be disabledso that nonconfirming code can work.

  • Reading an uninitialised variable is undefined

Compiler can assign any value to the variable and expressions derived from the variable
struct timeval tv;
unsigned long junk;

gettimeofday (&tv, NULL);
srandom ((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);
As shown in http://kqueue.org/blog/2012/06/25/more-randomness-or-less/. When compiled with a version of LLVM, entire seed computation is optimised away. Results of gettimeofday () and getpid () are not used at all srandom () is called with some garbage value.

  • Pointer arithmetic that wraps - PR54365

Reference:

[2]https://wiki.sei.cmu.edu/confluence/display/c/CC.+Undefined+Behavior#CC.UndefinedBehavior-ub_46

Saturday, 6 January 2018

New Cache Side Channel Attack Using Processor Speculation

Computing systems relies on privilege separation to protect memory of different processes. In simpler terms, a process can access memory pages mapped to its address space via MMU and it is the kernel which runs in privilege mode can do so.

Cache side channel attacks are not new. They have been demonstrated in various forms over the years. In the newer version, it is the processor speculation that is used in side channel attack.

Cache Access

Some points about cache hierarchy before go into the specifics of the attack:
  • Cache hierarchy is shared between processes. 
  • Cache is usually set associative. Cache is divided as sets and in each sets there will be multiple ways. When accessing the cache line, index part of the address is used to go into the set. Then a parallel search of all the ways are performed for the tag part to see if there is a match. If there is a match and the cache line is valid, we hit the cache
  • Cache can be indexed/tagged using physical/virtual addresses - four combinations
  • Cache level can be inclusive or exclusive


Figure above shows a 4-way set associative cache.  There is 256 sets in the cache so 8 bits from the address is used to select one. Once selected, any of the 4-ways can hold the line we are interested so 4 parallel checks are done for the tags and enabled by valid bits in the cache line.

Cache Side-channel Attack

Since cache is shared between processes, a process can manipulate lines in a cache ways such that it know what cache lines in the set is used used by victim process and what addresses are valid for victim. Flush and Reload is one of the side channel attack that is used in last level inclusive caches. It flushes all the cache lines and then reloads after the other process is run. If the access timing such that the memory is hit in the cache, we know the address is valid and victim process has loaded the cache into the cache.

Using Processor Out-of-order Vulnerability to Access memory Belonging to Others

In the newer meltdown attack, speculative memory accesses of the modern processors are used to load the memory in flush and reload attack. That is, in the modern highly pipelined superscalar architecture, processor architects relies in speculative execution to keep the pipeline busy. They use branch predictors to predict not only the branch direction but also the address taken. However, the predicted path will be retired and committed only after the branch has actually been resolved. But as shown by the authors of meltdown attack, the cache memory state will be modified if the speculated path had memory loads and the prediction turned out to be wrong. That is memory hierarchy state is not rolled back.

When a process is executing the address space of the process has the user space and the kernel space mapped. However user space cannot access the kernel address space and require kernel privilege to access this. This usually happens via syscall interface.

We can have a process requesting a cache line along a path that will be speculated but not actually executed. This can be done in many ways like having a branch that will never be taken but by tricking the branch predictor into predicting in other direction to speculate. This can also be used with a preceding exception that will throw and handling this. All the details are in the paper which is a must read [1].

Examples from Google blog [4]


In simpler terms, this Speculative load can be used to load memory belonging to the kernel via kernel calls. For example if there is a call where we provide a buffer to read and an offset for some data structures and if the kernel will validate this offset before loading. Speculation can then be used to load the memory.

struct array {
 unsigned long length;
 unsigned char data[];
};
struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
/* >0x400 (OUT OF BOUNDS!) */
unsigned long untrusted_offset_from_caller = ...;
if (untrusted_offset_from_caller < arr1->length) {
 unsigned char value = arr1->data[untrusted_offset_from_caller];
 unsigned long index2 = ((value&1)*0x100)+0x200;
 if (index2 < arr2->length) {
   unsigned char value2 = arr2->data[index2];
 }
}


For example, for the code above from [4], after the code is speculatively executed, by checking arr2->data[0x200] is cached or not will let us know what the value of arr1->data[untrusted_offset_from_caller]&1 is.

In kernel eBPF bytecode interpreter and JIT engine is used to create code like above and that is used to leak kernel memory. Complete information about the attack can be found in [4].

Kernel Page Table Isolation (KPTI) 

Kernel page table isolation from user space is proposed as a way to prevent this attack [2][3]. Since this is a hardware issue, fixing this in software is costly in-terms of performance. It has been reported that some of the syscall heavy applications can suffer significant performance losses. We live in a world where isolation is very important especially when cloud and virtualisation are the norms and preventing access to one users data fro the other is very important. So this is a necessary thing to do even if the performance penalty is significant.


Reference:

[1] https://meltdownattack.com/meltdown.pdf
[2] https://lkml.org/lkml/2017/12/4/709
[3] https://gruss.cc/files/kaiser.pdf
[4] https://googleprojectzero.blogspot.com.au

Saturday, 23 December 2017

Exploiting Buffer Overflow - Part1: Introduction

Whether it is the stack clash vulnerability or buffer overflow, one has to hijack the control flow of the program and make it execute what the adversary wants to do. Data Execution Prevention (DEP) and Address Space Layout Randomisation (ASLR) makes exploiting buffer overflow harder but not impossible.

DEP prevents injecting code into overflown buffer in stack and executing them by changing the saved return address to jump to the injected code. DEP need support from hardware and allows codes execution only from marked region which are explicitly allowed. Only code segments will be allowed to execute.

Return Oriented Programming (ROP) uses simple sequence of instructions in the binary called gadgets that are discovered and used. Gadgets can be called from overflown buffer and linked together to perform what the adversary wanted. Key to this is discovering gadgets and the addresses where they are located. ASLR makes it harder by loading the Position Independent Executable (PIE) in random addresses.

In a long running applications like servers sometimes fork executable and makes copies to serve each request. Thus they tend to get loaded into the same start address making them easy to detect gadgets  and create exploits remotely.

gcc can use stack canary to detect if a buffer is overflown. This is a random value which is written at the end of the stack and verified when the call returns. If the canary is written to something other than what was expected, the program will terminate. However, if the program is such that it restarts like the long running application, it is not hard to learn the canary using brute force.

In a series of posts, I plan to document my experimentation with the buffer flow exploitation. Firstly the basics with an artificially created program. Then I will look into using fuzzers to look for security issues and then use them to create a workable exploits.




Wednesday, 20 December 2017

Stack Clash Protection in gcc

In the last post, I explored some of the history behind stack clash vulnerability and how the newer exploits overcame stack guard page. In this post, I am going to look at how gcc detect this with -fstack-clash-protection.

-fstack-clash-protection probes the allocated pages such that we cannot jump over the guard page without accessing it. Thus, if the stack allocated extends into stack guard page, it will be detected by the OS as the probe will access the page. All larger allocations are broken and probed as it is allocated. Thus, when a combination of stack allocation exceeds, stack probes will be performed for each probe interval.

Lets look this with a contrived example. Common use cases for expanding stack dynamically are alloca () and VLAs. In the following code snippet we have alloca () whose argument will only be known at run time. 

char *t;
int foo ( char *arg)
{
  t = alloca (strlen (arg));
  strcpy (t, arg);
  return 1;
}

gcc arm64 will generate asm code as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x0, [x29, 24]
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 ldr x1, [x29, 24]
 adrp x3, t
 sub sp, sp, x0
 mov x2, sp
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldp x29, x30, [sp], 32
 ret

Lets look at the asm quickly to understand what is going on before we look at  stack clash probing. Procedure calling standard as part of the ABI dictates how arguments are passed. Here, x29 is the frame pointer and x30 is the link register. In line 2, x29 and x30 are saved in the sack. Line 3 updates frame pointer with the stack pointer as part of the prologue of the function foo. In ARM64,  first  arguments is passed via register x0 and the return value is returned via x0 register. Since call to strlen in line 6 will destroy the content of x0, line 4 saves the content of x0 in stack. Line 6 and Line 7 is creating the alignment required for the alloca. Line 8 loads the previously saved content of x0 into x1. Line 10 performs the actual alloca of increasing the stack size. note here that we don't access the stack at this point and can easily be used to bypass the stack guard.

There seems to be some bad register allocation decisions  made in the above generated code. I will leave that discussion for another day and show how -fstack-clash-protection inserts probes and allows us to detect the stack clash vulnerability. The following asm is generated with -fstack-clash-protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x19, [sp, 16]
 mov x19, x0
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 mov x1, sp
 and x2, x0, -4096
 sub x2, sp, x2
 cmp x1, x2
 beq .L3
.L6:
 sub sp, sp, #4096
 mov x1, sp
 str xzr, [sp, 4088]
 cmp x1, x2
 bne .L6
.L3:
 and x0, x0, 4095
 sub sp, sp, x0
 sub x0, x0, #8
 str xzr, [sp, x0]
 mov x2, sp
 mov x1, x19
 adrp x3, t
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldr x19, [sp, 16]
 ldp x29, x30, [sp], 32
 ret

In the above code, stack is increased as multiples of 4k (the page size). Line 14 to Line 20 is the loop where  stack is increased by 4k and a store to the stack is performed. There are some subtle target specific optimizations are performed to minimize the performance impact of this as explained in the gcc post. 

Sunday, 17 December 2017

Stack Clash Vulnerability - Introduction

Stack clash vulnerability allows adversaries to corrupt memory and execute arbitrary code. In Linux, heap grows by way of explicit system call brk(). On the other hand when stack grows into unallocated region, it triggers a pagefault. OS then allocates the memory if that doesn't extend into the stack guard. If the stack grows into already allocated region belonging to other region such as heap, kernel would not know. Adversaries can then use this to inject and execute shell code.

Stack clash attacks are not new.  Gael[3] and Rafal[2] presented them in 2005 and 2010 respectively. This is performed by allocating large amount of mmaped pages and then performing a large recursive call such that the stack is overflown and collides with the mmaped page. After this, Linux (and other OS) provided a stack guard below the stack which is not mappable to circumvent this attack.  Thus accessing this area will trigger a page fault. Compilers such as gcc also provide static smashing protection that can detect some of the stack overflows.

Qualys researchers recently demonstrated various ways to still use them to gain access into Linux like O/S. Qualsys in their research shown that by way jump over the guard page. This involved:
  • Clash the stack with large stack allocation and bring the stack pointer back to its region. One way to do this is by having a recursive call.
  • Jump over the stack guard page into the other region. Qualsys [1] lists various ways to do this including using glibc's vfprintf() function.  vfprintf function allows allocation of stack buffer which is not fully written. Refer to [1] for complete detail.
  • Smash in the region
gcc -fstack-check implementation aims to prevent this but unfortunately failed at it. Jeff Law from Redhat posted a seres of patches [5] to gcc to handle this. Kernel also increase the stack guard size [6]. There were also glibc patches that fixed some of the associated issues.

In the next blog I will go into the details of how gcc is modified to detect stack clash vulnerability.

Reference:

[1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[2] https://cansecwest.com/core05/memory_vulns_delalleau.pdf
[3] http://invisiblethingslab.com/resources/misc-2010/xorg-large-memory-attacks.pdf
[4] https://access.redhat.com/security/vulnerabilities/stackguard
[5] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01112.html
[6] https://patchwork.kernel.org/patch/9796395/