Wednesday, 20 December 2017

Stack Clash Protection in gcc

In the last post, I explored some of the history behind stack clash vulnerability and how the newer exploits overcame stack guard page. In this post, I am going to look at how gcc detect this with -fstack-clash-protection.

-fstack-clash-protection probes the allocated pages such that we cannot jump over the guard page without accessing it. Thus, if the stack allocated extends into stack guard page, it will be detected by the OS as the probe will access the page. All larger allocations are broken and probed as it is allocated. Thus, when a combination of stack allocation exceeds, stack probes will be performed for each probe interval.

Lets look this with a contrived example. Common use cases for expanding stack dynamically are alloca () and VLAs. In the following code snippet we have alloca () whose argument will only be known at run time. 

char *t;
int foo ( char *arg)
{
  t = alloca (strlen (arg));
  strcpy (t, arg);
  return 1;
}

gcc arm64 will generate asm code as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x0, [x29, 24]
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 ldr x1, [x29, 24]
 adrp x3, t
 sub sp, sp, x0
 mov x2, sp
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldp x29, x30, [sp], 32
 ret

Lets look at the asm quickly to understand what is going on before we look at  stack clash probing. Procedure calling standard as part of the ABI dictates how arguments are passed. Here, x29 is the frame pointer and x30 is the link register. In line 2, x29 and x30 are saved in the sack. Line 3 updates frame pointer with the stack pointer as part of the prologue of the function foo. In ARM64,  first  arguments is passed via register x0 and the return value is returned via x0 register. Since call to strlen in line 6 will destroy the content of x0, line 4 saves the content of x0 in stack. Line 6 and Line 7 is creating the alignment required for the alloca. Line 8 loads the previously saved content of x0 into x1. Line 10 performs the actual alloca of increasing the stack size. note here that we don't access the stack at this point and can easily be used to bypass the stack guard.

There seems to be some bad register allocation decisions  made in the above generated code. I will leave that discussion for another day and show how -fstack-clash-protection inserts probes and allows us to detect the stack clash vulnerability. The following asm is generated with -fstack-clash-protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
foo:
 stp x29, x30, [sp, -32]!
 add x29, sp, 0
 str x19, [sp, 16]
 mov x19, x0
 bl strlen
 add x0, x0, 15
 and x0, x0, -16
 mov x1, sp
 and x2, x0, -4096
 sub x2, sp, x2
 cmp x1, x2
 beq .L3
.L6:
 sub sp, sp, #4096
 mov x1, sp
 str xzr, [sp, 4088]
 cmp x1, x2
 bne .L6
.L3:
 and x0, x0, 4095
 sub sp, sp, x0
 sub x0, x0, #8
 str xzr, [sp, x0]
 mov x2, sp
 mov x1, x19
 adrp x3, t
 mov x0, x2
 str x2, [x3, #:lo12:t]
 bl strcpy
 add sp, x29, 0
 mov w0, 1
 ldr x19, [sp, 16]
 ldp x29, x30, [sp], 32
 ret

In the above code, stack is increased as multiples of 4k (the page size). Line 14 to Line 20 is the loop where  stack is increased by 4k and a store to the stack is performed. There are some subtle target specific optimizations are performed to minimize the performance impact of this as explained in the gcc post. 

No comments:

Post a Comment