Solving 67k binaries with r2pipe
This was a 400 point reverse engineering challenge from EasyCTF 2017. We’re given a zip file containing 67,139 small programs starting from 00000.exe to 10642.exe. The idea is to solve each one in order and to join their output. The end result would lead to the flag. Here’s the challenge description:
Here are 67k binaries, well more accurately 67,139 binaries. Solve every single one, append the results together in order (shouldn’t be too difficult as the binaries are numbered) and then from there I’m sure you can figure it out.
There are probably a hundred ways to solve this challenge, but I decided to give it a go using radare2’s r2pipe.
Disclaimer: This was the first time I’d used r2pipe, so I apologize for the noobness. After much Googling and fiddling, I ended up with a hacky script that solved the challenge. The script can definitely be improved upon, and I’d love to hear suggestions from those who are more experienced with r2pipe or radare2 scripting.
I loaded 00000.exe into radare2 for some static analysis:
entry0 is where the program starts. This function basically breaks down to:
- get a number from the user
- set eax to a value stored at an address (in this case 0x403000). I’ll call this value
- set ecx to a constant value (in this case 0xa1a8a7ed). I’ll call this value
- call a function, I’ll call it
do_op(), that returns the result of an operation (in this case
do_op() looks like:
The return value of this function, I’ll call it
z, is compared against the user’s input. If they are identical it follows a branch that does the following:
- set cl to a value stored at an address (in this case 0x403007)
zby cl bits and store the result in eax
- do a bitwise
andon eax and print out the result
Nothing complicated. The goal is to enter the correct input expected by each binary and concatenate its output to get the flag. I examined a handful of binaries and found the following:
- the values for
ydiffer for each binary
- the operation performed by function
do_op()can be one of
xor; however it’s always
op eax, ecx
So in order to solve each binary without having to run it, I needed to calculate
z, do a shift arithmetic right on it by a certain number of bits, and finally do a bitwise
and on the result.
Scripting the whole thing with r2pipe was actually pretty easy. For instance, here’s how I got the address of
r2p = r2pipe.open(sys.argv) # open the binary r2p.cmd("aaa") # analyze it t = r2p.cmd("aflj") # list all functions; should return two results: entry0 and fcn.???????? # returns the result in JSON d = json.loads(t) # get the results in a dictionary fc_name = d["name"] # there are 2 dictionaries returned; check if the first one is fcn.???????? if fc_name == "entry0": # if not, get it from the second dictionary fc_name = d["name"] print "do_op() is", fc_name
If I run this I get
Basically it’s all about using
r2p.cmd() to run a radare2 command and parsing its output. In this case I’ve chosen to return the results of the command in JSON for easier parsing. If there’s a better way, I’m all ears!
Here’s the full script:
#!/usr/bin/env python import r2pipe import json import sys if __name__ == "__main__": r2p = r2pipe.open(sys.argv) r2p.cmd("aaa") # get the address of do_foo() t = r2p.cmd("aflj") d = json.loads(t) fc_name = d["name"] if fc_name == "entry0": fc_name = d["name"] # determine if sub, add, or xor is used; just want the opcode at this point t = r2p.cmd("pdj 1@%s" %( fc_name)) # <op> eax, ecx d = json.loads(t) ins = d["opcode"] # get the value of EAX t = r2p.cmd("pdj [email protected]+0x1f") # mov eax, dword [0xNNNNNNNN] d = json.loads(t) pointer = d["esil"].split(",") pointer = int(pointer, 16) t = r2p.cmd("pxrj 4@%d" % (pointer,)) # get value pointed to by 0xNNNNNNNN d = json.loads(t) eax = d["value"] # get the value of ECX t = r2p.cmd("pdj [email protected]+0x24") # mov ecx, 0xMMMMMMMM d = json.loads(t) ecx = d["opcode"].split()[-1] ecx = int(ecx, 16) # determine the operation used by do_foo() answer = 0 if "sub" in ins: answer = eax - ecx elif "xor" in ins: answer = eax ^ ecx elif "add" in ins: answer = eax + ecx # get value to use for SAR operation t = r2p.cmd("pdj [email protected]+0x36") # mov cl, byte [0xNNNNNNNN] d = json.loads(t) pointer = d["esil"].split(",") pointer = int(pointer, 16) t = r2p.cmd("pxrj 4@%d" % (pointer,)) # get value pointed to by 0xNNNNNNNN t = t.replace("\\x", "") # get rid of escapes json doesn't like d = json.loads(t) val = d["value"] cl = val & 0xff # get the solution to the challenge solve = answer >> cl solve = solve & 0xff sys.stdout.write("%c" % (solve,))
The script is commented so hopefully it makes sense, It basically figures out what the expected input is and what the binary’s output will be.
To demonstrate the script, I’ve copied a handful of the binaries to a sample directory and ran it:
I’ve attached the completed output and the challenge binaries here if you’d like to play with it as well
Overall this was a great learning experiencef or me. I’ll be doing some more reading on r2pipe to see what else I can do with it.