Saturday 23 December 2017

Exploiting Buffer Overflow - Part1: Introduction

Whether it is the stack clash vulnerability or buffer overflow, one has to hijack the control flow of the program and make it execute what the adversary wants to do. Data Execution Prevention (DEP) and Address Space Layout Randomisation (ASLR) makes exploiting buffer overflow harder but not impossible.

DEP prevents injecting code into overflown buffer in stack and executing them by changing the saved return address to jump to the injected code. DEP need support from hardware and allows codes execution only from marked region which are explicitly allowed. Only code segments will be allowed to execute.

Return Oriented Programming (ROP) uses simple sequence of instructions in the binary called gadgets that are discovered and used. Gadgets can be called from overflown buffer and linked together to perform what the adversary wanted. Key to this is discovering gadgets and the addresses where they are located. ASLR makes it harder by loading the Position Independent Executable (PIE) in random addresses.

In a long running applications like servers sometimes fork executable and makes copies to serve each request. Thus they tend to get loaded into the same start address making them easy to detect gadgets  and create exploits remotely.

gcc can use stack canary to detect if a buffer is overflown. This is a random value which is written at the end of the stack and verified when the call returns. If the canary is written to something other than what was expected, the program will terminate. However, if the program is such that it restarts like the long running application, it is not hard to learn the canary using brute force.

In a series of posts, I plan to document my experimentation with the buffer flow exploitation. Firstly the basics with an artificially created program. Then I will look into using fuzzers to look for security issues and then use them to create a workable exploits.




Wednesday 20 December 2017

Stack Clash Protection in gcc

In the last post, I explored some of the history behind stack clash vulnerability and how the newer exploits overcame stack guard page. In this post, I am going to look at how gcc detect this with -fstack-clash-protection.

-fstack-clash-protection probes the allocated pages such that we cannot jump over the guard page without accessing it. Thus, if the stack allocated extends into stack guard page, it will be detected by the OS as the probe will access the page. All larger allocations are broken and probed as it is allocated. Thus, when a combination of stack allocation exceeds, stack probes will be performed for each probe interval.

Lets look this with a contrived example. Common use cases for expanding stack dynamically are alloca () and VLAs. In the following code snippet we have alloca () whose argument will only be known at run time. 

char *t;
int foo ( char *arg)
{
  t = alloca (strlen (arg));
  strcpy (t, arg);
  return 1;
}

gcc arm64 will generate asm code as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x0, [x29, 24]
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 ldr x1, [x29, 24]
 adrp x3, t
 sub sp, sp, x0
 mov x2, sp
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldp x29, x30, [sp], 32
 ret

Lets look at the asm quickly to understand what is going on before we look at  stack clash probing. Procedure calling standard as part of the ABI dictates how arguments are passed. Here, x29 is the frame pointer and x30 is the link register. In line 2, x29 and x30 are saved in the sack. Line 3 updates frame pointer with the stack pointer as part of the prologue of the function foo. In ARM64,  first  arguments is passed via register x0 and the return value is returned via x0 register. Since call to strlen in line 6 will destroy the content of x0, line 4 saves the content of x0 in stack. Line 6 and Line 7 is creating the alignment required for the alloca. Line 8 loads the previously saved content of x0 into x1. Line 10 performs the actual alloca of increasing the stack size. note here that we don't access the stack at this point and can easily be used to bypass the stack guard.

There seems to be some bad register allocation decisions  made in the above generated code. I will leave that discussion for another day and show how -fstack-clash-protection inserts probes and allows us to detect the stack clash vulnerability. The following asm is generated with -fstack-clash-protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x19, [sp, 16]
 mov x19, x0
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 mov x1, sp
 and x2, x0, -4096
 sub x2, sp, x2
 cmp x1, x2
 beq .L3
.L6:
 sub sp, sp, #4096
 mov x1, sp
 str xzr, [sp, 4088]
 cmp x1, x2
 bne .L6
.L3:
 and x0, x0, 4095
 sub sp, sp, x0
 sub x0, x0, #8
 str xzr, [sp, x0]
 mov x2, sp
 mov x1, x19
 adrp x3, t
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldr x19, [sp, 16]
 ldp x29, x30, [sp], 32
 ret

In the above code, stack is increased as multiples of 4k (the page size). Line 14 to Line 20 is the loop where  stack is increased by 4k and a store to the stack is performed. There are some subtle target specific optimizations are performed to minimize the performance impact of this as explained in the gcc post. 

Sunday 17 December 2017

Stack Clash Vulnerability - Introduction

Stack clash vulnerability allows adversaries to corrupt memory and execute arbitrary code. In Linux, heap grows by way of explicit system call brk(). On the other hand when stack grows into unallocated region, it triggers a pagefault. OS then allocates the memory if that doesn't extend into the stack guard. If the stack grows into already allocated region belonging to other region such as heap, kernel would not know. Adversaries can then use this to inject and execute shell code.

Stack clash attacks are not new.  Gael[3] and Rafal[2] presented them in 2005 and 2010 respectively. This is performed by allocating large amount of mmaped pages and then performing a large recursive call such that the stack is overflown and collides with the mmaped page. After this, Linux (and other OS) provided a stack guard below the stack which is not mappable to circumvent this attack.  Thus accessing this area will trigger a page fault. Compilers such as gcc also provide static smashing protection that can detect some of the stack overflows.

Qualys researchers recently demonstrated various ways to still use them to gain access into Linux like O/S. Qualsys in their research shown that by way jump over the guard page. This involved:
  • Clash the stack with large stack allocation and bring the stack pointer back to its region. One way to do this is by having a recursive call.
  • Jump over the stack guard page into the other region. Qualsys [1] lists various ways to do this including using glibc's vfprintf() function.  vfprintf function allows allocation of stack buffer which is not fully written. Refer to [1] for complete detail.
  • Smash in the region
gcc -fstack-check implementation aims to prevent this but unfortunately failed at it. Jeff Law from Redhat posted a seres of patches [5] to gcc to handle this. Kernel also increase the stack guard size [6]. There were also glibc patches that fixed some of the associated issues.

In the next blog I will go into the details of how gcc is modified to detect stack clash vulnerability.

Reference:

[1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[2] https://cansecwest.com/core05/memory_vulns_delalleau.pdf
[3] http://invisiblethingslab.com/resources/misc-2010/xorg-large-memory-attacks.pdf
[4] https://access.redhat.com/security/vulnerabilities/stackguard
[5] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01112.html
[6] https://patchwork.kernel.org/patch/9796395/