Frequently when analyzing a new malware sample, I find that some or all of the referenced strings are encrypted in memory - even after the sample is unpacked. Malware authors do this is to make analysis harder. You can't just dump the process memory and run strings on it if the strings are encrypted. It's also harder to figure out what each function does if the string references aren't clear.
A client I was working with wanted to know some key pieces of information about an IRC-controlled found on their network (MD5 d9b2e9d8826cab35e3287a0a35248f40) - what registry keys it uses to autostart (from their own testing they knew it picks a different value every time), and what hosts/ports it connects out to. These values were all stored encrypted and didn't show up in in the unpacked EXE or a dump of the process, and running the sample through auto-analysis tools like Anubis won't provide this information - that's why they called me.
In a previous post, I created an IDA script to decrypt encrypted strings used in the Zeus trojan. That example of decrypting strings was easy because the algorithm was trivial. When the algorithm is more complicated, it can be helpful to load the sample up in the IDA debugger and use the Appcall functionality to decrypt the data.
The Appcall feature was added in IDA 5.6 and lets you call functions in a debugged program. You can treat any function in your disassembly as a built-in IDA function in a script as long as the function has type information in the IDA database. The full Appcall reference has more information.
In this example, there are functions I named XORStringDecrypt and XORStringEncrypt. When the bot wants to access one of the strings, it passes a pointer to that string to XORStringDecrypt, which decrypts the string in-place in memory. After it's used, the code calls XORStringEncrypt to re-encrypt the data. This ensures that the decrypted strings won't show up in a process memory dump. The XORStringDecrypt function itself is trivial, but it does rely on a key that's calculated when the bot starts up:
string_decrypt_key is a 16 byte array gets setup when the bot starts. It's an MD5 of some other data embedded in the EXE file, but with the first byte OR'd with 0x80 to set the high bit to 1. This ensures that the first encrypted byte of any ASCII text will be higher than 0x7F. The decryption routine checks that each string starts with a value higher than 0x7F before decrypting to make sure that the string hasn't been already decrypted.
I could have easily just figured out string_decrypt_key in the debugger then wrote a decryption routine in IDC like I did for the Zeus post, but I tried something different.
This is a table of strings used by the function that sets up the auto-run entry in the registry (under HKLM\Software\Microsoft\Windows\CurrentVersion\Run) - it picks a name for key and a name for the EXE file. The name and key are randomly chosen and each one looks like a real entry that could be used by Windows. Here's what it looks like encrypted:
The script I wrote to decrypt the strings searches the .data section of the EXE for string references that start with an ASCII code higher than 0x7F. When it finds one, it calls XORStringDecrypt on it. To call XORStringDecrypt, the script calls LocByName to get the offset of the function, then uses the Appcall() function to call it:
This script can only be run while debugging the executable, and it will only work correctly after the encryption key is set up. I set a breakpoint immediately after the key is setup and then ran the script. The string data looks much better now:
This is just one of the many interesting possibilities for using IDA's awesome Appcall functionality.
Sunday, July 25, 2010
Wednesday, July 14, 2010
Unpacking VBInject/VBCrypt/RunPE
VBInject (also called VBCrypt or RunPE) is a family of packers (actually crypters... VBInject will increase the size of the target file) written in Visual Basic 6.0. It's a pain in the ass for antivirus vendors because it's difficult to tell what's actually inside of the packed file. Unless someone has already unpacked a specific sample and created a signature, most AV products will flag a VBInject-packed executable as just being VBInject. Detecting VBInject itself is easy - detecting what's inside is more difficult. This means that any removal instructions you find for VBInject is going to be wrong - a VBInject-packed file could contain any other piece of malware. The whole reason hackers use VBInject is to avoid having their malware detected by AV. When an AV vendor scans a sample and finds that it's detected as VBInject, most of the time they won't bother cracking it open to see what evil lurks within.
VBInject and VBCrypt are names invented by the industry, but in the underground these tools are generally referred to as RunPE, so I'll use that name. Also it's easier to type.
What makes it so difficult to unpack and analyze a RunPE-packed executable? There are a few key differences between RunPE and traditional packers/crypters/protectors:
There are TONS of variants of RunPE out there, and every wanna-be packer/crypter writer writes their own. It's VB6, anyone can do it, just search and replace a few strings and you have a custom copy. There are even RunPE generators that contain fabulous graphics:
This one advertises itself as a FUD RunPE generator. FUD means "fully undetectable" to the kids in the underground - it means no AV software will detect your beautiful malware (although, packers advertised as FUD don't stay that way for long). All this thing does is spit out a copy of the RunPE source (a VB.NET version) with randomized class and function names, with some garbage code thrown in. It's enough to fool many AV products. I won't say which ones to avoid insulting my friends who work at those companies.
So where did RunPE come from? Who knows... the first reference I saw to it was early 2009, it started getting really popular in mid-2009, and now everyone is using it. It's been evolving since the first public release, too. For example, the first version used Kernel32.WriteProcessMemory to write memory to the new process. Later versions used Ntdll.NtWriteVirtualMemory to screw up my breakpoints. Newer versions don't even use the regular VB6 DLL importing functionality and actually use thunks written in assembly code to load DLL modules and call external functions the hard way (check out NTRunPE, cNtPEL, etc). Every few months an even-more-obfuscated version will appear.
Behind all of the obfuscation, they all work about the same way:
To see how this works, look at the relevant code from the original RunPE:
This version didn't perform any encryption or obfuscating of the EXE code, all of the imports were visible from various PE tools, and it used MessageBox calls to report errors. Running strings on the binary would give you a good idea of what's going on. Overall, it's not very stealthy.
Now here's a more recent and highly obfuscated version:
You can see that the new version uses obfuscated DLL calls - so running strings or checking the imports with dumpbin won't give you much of a clue about what this thing is doing. Even running it through IDA with VB6 scripts or a VB6 decompiler won't help much.
That GetProcAddress call isn't making a DLL function call - it's actually traversing the module's import directory in memory. In VB code! That code is beyond the scope of this article, but you can find a version of it here (Google for cNtPEL if the link is dead).
So now we know how RunPE works. Because it extracts the code into a newly-launched process and not it's own process, many auto-unpackers will fail to detect that the sample is even packed - it doesn't modify it's own memory space. It requires some special techniques to unpack. It's not hard though - this is actually one of the easiest packers to unpack once you know how. That's because it doesn't require any rebuilding of import tables, finding the original entry point (OEP), fixing relocations, or realigning the sections.
The trick (using any debugger) is to first set a breakpoint on CreateProcessW. This will let us catch calls to CreateProcessA as well. This breakpoint won't work if someone writes a version of RunPE that talks directly to the CSRSS subsystem to launch a process, but let's just hope that never happens. After breaking on CreateProcessW, we can make sure that the 6th argument (at ESP+24) has bit 0x00000004 set - this is the value for the CREATE_SUSPENDED flag. If this flag isn't set, something else is going on other than RunPE.
Now we can step out of that function and set a breakpoint on NtWriteVirtualMemory (WriteProcessMemory calls that, so we'll catch versions that use that call too). The reason to wait until after CreateProcessW finishes is because NtWriteVirtualMemory is called a few times during the creation of a process - we only want to break when RunPE is calling it. When the breakpoint is set, continue executing. The next break should be when RunPE is writing the new EXE bytes over the new process that was created. The 3rd argument to NtWriteVirtualMemory (at ESP+16) is a pointer to the buffer that's getting copied from. This will be the beginning of the EXE file. Examine the memory with your debugger to make sure it looks like an EXE file.
This EXE file is (probably) in disk format - it hasn't been mapped to memory. RunPE does the mapping itself. This means we can just dump the bytes to file. The tricky part is figuring out how many bytes to write. The quick and dirty method is to use the PE optional header's SizeOfImage field, but that will get you some extra bytes because the on-disk size is almost always less than SizeOfImage. My prefered method is a bit more elegant - I calculate the size of the PE file by using the PtrToRawData and SizeOfRawData members of the last section. It's possible that this can miss bytes in a PE file with stuff tacked on the end, but I haven't ran into any PE files packed with RunPE that do this. In fact, the RunPE code itself stops copying the bytes at the end of the last section.
After the 2nd time I had to go through this process, I wrote an IDC script for IDA to automate the process for me. It would be possible to do this in any scriptable debugger, the Win32 debugging API, or a toolkit like TitanEngine or EasyHook, but IDA has become my debugger of choice lately so I used that. Here's the script:
It will launch the debugger, set some breakpoints, then write the unpacked PE file (if found) to "unpacked.exe". This scripts works on all of the in-the-wild RunPE-packed malware I tried it on, but there are a lot of ways around it. That's why there's error checking and it stops the script if anything strange happens.
When the script finishes, the debugger is still running and both processes are suspended. It's up to you to terminate the debugged process (and the child process), but that part could easily be automated as well.
VBInject and VBCrypt are names invented by the industry, but in the underground these tools are generally referred to as RunPE, so I'll use that name. Also it's easier to type.
What makes it so difficult to unpack and analyze a RunPE-packed executable? There are a few key differences between RunPE and traditional packers/crypters/protectors:
- The way that it works. The packed executable re-launches itself as a new process and then overwrites that process's memory. This is drastically different than most other packers which overwrite their own process's memory. Generic unpacking tools are often unequipped to deal with this.
- The unpacking code itself is written in VB6, which ends up as interpreted bytecode (p-code). The only way to figure out the unpacking algorithm is to reverse engineer this code. It's possible to build VB apps as native code, but for whatever reason, many RunPE variants will only run if they're built as p-code and not as native executables. This also makes it harder to reverse engineer... (There are now some versions that use VB.NET, which isn't p-code or native, but .NET CIL)
- It's ridiculously easy to modify and create new versions if you can read VB code, which doesn't exactly take a PhD. Because the packing/unpacking code can be modified so easily, you never know which algorithm will be used to extract the original EXE file without reverse engineering the VB code (which could be p-code, native assembly, or even .NET IL). It could be a simple XOR or an actual decryption or decompression routine. The VB overhead adds an extra layer of obfuscation to get in the way of reverse engineering.
There are TONS of variants of RunPE out there, and every wanna-be packer/crypter writer writes their own. It's VB6, anyone can do it, just search and replace a few strings and you have a custom copy. There are even RunPE generators that contain fabulous graphics:
This one advertises itself as a FUD RunPE generator. FUD means "fully undetectable" to the kids in the underground - it means no AV software will detect your beautiful malware (although, packers advertised as FUD don't stay that way for long). All this thing does is spit out a copy of the RunPE source (a VB.NET version) with randomized class and function names, with some garbage code thrown in. It's enough to fool many AV products. I won't say which ones to avoid insulting my friends who work at those companies.
So where did RunPE come from? Who knows... the first reference I saw to it was early 2009, it started getting really popular in mid-2009, and now everyone is using it. It's been evolving since the first public release, too. For example, the first version used Kernel32.WriteProcessMemory to write memory to the new process. Later versions used Ntdll.NtWriteVirtualMemory to screw up my breakpoints. Newer versions don't even use the regular VB6 DLL importing functionality and actually use thunks written in assembly code to load DLL modules and call external functions the hard way (check out NTRunPE, cNtPEL, etc). Every few months an even-more-obfuscated version will appear.
Behind all of the obfuscation, they all work about the same way:
- Decrypt/unpack/unobfuscate the original EXE file in memory (stored as a byte array in the VB code)
- Call CreateProcess() on a target EXE (usually the same EXE that's currently executing) using the CREATE_SUSPENDED flag. This maps the executable into memory and it's ready to execute, but the entry point hasn't executed yet.
- Call NtUnmapViewOfSection() to unmap the virtual address space used by the new process
- Call VirtualAllocEx() to re-allocate the memory in the process's address space to the correct size (the size of the new EXE)
- Call WriteProcessMemory() to write the PE headers and each section of the new EXE (unpacked in Step 1) to the virtual address location they expect to be (calling VirtualProtextEx() to set the protection flags that each section needs).
- Call SetThreadContext() and then ResumeThread() to start executing the new executable.
To see how this works, look at the relevant code from the original RunPE:
This version didn't perform any encryption or obfuscating of the EXE code, all of the imports were visible from various PE tools, and it used MessageBox calls to report errors. Running strings on the binary would give you a good idea of what's going on. Overall, it's not very stealthy.
Now here's a more recent and highly obfuscated version:
You can see that the new version uses obfuscated DLL calls - so running strings or checking the imports with dumpbin won't give you much of a clue about what this thing is doing. Even running it through IDA with VB6 scripts or a VB6 decompiler won't help much.
That GetProcAddress call isn't making a DLL function call - it's actually traversing the module's import directory in memory. In VB code! That code is beyond the scope of this article, but you can find a version of it here (Google for cNtPEL if the link is dead).
So now we know how RunPE works. Because it extracts the code into a newly-launched process and not it's own process, many auto-unpackers will fail to detect that the sample is even packed - it doesn't modify it's own memory space. It requires some special techniques to unpack. It's not hard though - this is actually one of the easiest packers to unpack once you know how. That's because it doesn't require any rebuilding of import tables, finding the original entry point (OEP), fixing relocations, or realigning the sections.
The trick (using any debugger) is to first set a breakpoint on CreateProcessW. This will let us catch calls to CreateProcessA as well. This breakpoint won't work if someone writes a version of RunPE that talks directly to the CSRSS subsystem to launch a process, but let's just hope that never happens. After breaking on CreateProcessW, we can make sure that the 6th argument (at ESP+24) has bit 0x00000004 set - this is the value for the CREATE_SUSPENDED flag. If this flag isn't set, something else is going on other than RunPE.
Now we can step out of that function and set a breakpoint on NtWriteVirtualMemory (WriteProcessMemory calls that, so we'll catch versions that use that call too). The reason to wait until after CreateProcessW finishes is because NtWriteVirtualMemory is called a few times during the creation of a process - we only want to break when RunPE is calling it. When the breakpoint is set, continue executing. The next break should be when RunPE is writing the new EXE bytes over the new process that was created. The 3rd argument to NtWriteVirtualMemory (at ESP+16) is a pointer to the buffer that's getting copied from. This will be the beginning of the EXE file. Examine the memory with your debugger to make sure it looks like an EXE file.
This EXE file is (probably) in disk format - it hasn't been mapped to memory. RunPE does the mapping itself. This means we can just dump the bytes to file. The tricky part is figuring out how many bytes to write. The quick and dirty method is to use the PE optional header's SizeOfImage field, but that will get you some extra bytes because the on-disk size is almost always less than SizeOfImage. My prefered method is a bit more elegant - I calculate the size of the PE file by using the PtrToRawData and SizeOfRawData members of the last section. It's possible that this can miss bytes in a PE file with stuff tacked on the end, but I haven't ran into any PE files packed with RunPE that do this. In fact, the RunPE code itself stops copying the bytes at the end of the last section.
After the 2nd time I had to go through this process, I wrote an IDC script for IDA to automate the process for me. It would be possible to do this in any scriptable debugger, the Win32 debugging API, or a toolkit like TitanEngine or EasyHook, but IDA has become my debugger of choice lately so I used that. Here's the script:
It will launch the debugger, set some breakpoints, then write the unpacked PE file (if found) to "unpacked.exe". This scripts works on all of the in-the-wild RunPE-packed malware I tried it on, but there are a lot of ways around it. That's why there's error checking and it stops the script if anything strange happens.
When the script finishes, the debugger is still running and both processes are suspended. It's up to you to terminate the debugged process (and the child process), but that part could easily be automated as well.
Tuesday, July 6, 2010
Decrypting encrypted strings in Zeus
There are a few strings in the latest Zeus bot that are stored in an encrypted format. When a function needs one of them, it calls another function to decrypt the string and store it in a buffer on the stack. There are only about 50 of these encrypted strings, and as I found out, they aren't very interesting. They aren't exposing any hidden functionality - I suspect they were just included to make reversing the bot a little bit harder.
The strings are referenced by a table where each entry has a structure that looks like this:
struct { ushort XorKey; ushort Length; char *EncryptedString; } encrypted_string;
In IDA, the table looks like this at first:
To figure out what these strings were, I first reverse-engineered the decryption routine. There were actually two separate routines - one that returns the strings as ASCII and the other as wide characters. This is the decryption routine for returning ASCII data:
It's very simple - each encrypted character gets the string length subtracted from it, then XOR'd with an 8-bit key which is different for each string.
To get a decrypted string, a function will put the requested string's index into EAX and put pointer to hold the decrypted string in EDI (ESI is used for the wide character version), then call the function I labeled GetEncryptedString(). The actual code used by the bot will PUSH a 32 bit immediate value and then POP it into EAX:
To make it easier to understand the disassembly, I wrote an IDC script that will first decrypt and label the strings in the encrypted string table, then search the code for references to the GetEncryptedStringA() and GetEncryptedStringW() functions to add comments to those.
Here's the script:
After running the script, the string table now looks like this:
And code that calls the GetEncryptedStringA/W functions looks like this:
This makes reading and annotating the disassembly a little easier.
Friday, July 2, 2010
How Zeus finds the base address of kernel32.dll
I found this function in the new version (v1.4) of the Zeus/Zbot data stealing trojan:
This function uses the PEB (Process Environment Block) of the current process (stored at fs:[30h]) to locate a linked list of loaded modules. The PEB contains a member called Ldr that is a pointer to a PEB_LDR_DATA structure. This structure contains a set of LIST_ENTRY structures that point to linked lists of LDR_DATA_TABLE_ENTRY structures. The InMemoryOrderModuleList linked list is used to enumerate the loaded modules by the order that they're loaded in memory.
For each module in the list, it does a simple hash of the first 24 bytes (or 12 Unicode characters) and checks the hash against a certain value (0x6A4ABC5B). This value happens to be the hash for "kernel32.dll". Once this module is located, the base address is returned in the EAX register.
The technique of enumerating the loaded modules using the PEB is commonly used by malware to locate the base address (which is also the module handle) of certain DLL files to avoid using the GetModuleHandle API call. There are a few reasons to do this:
1. Some malware auto-analysis tools and generic unpackers hook calls to GetModuleHandle
2. The address of GetModuleHandle may be unknown - this can happen if the code doesn't know what context it's running in, for example in shellcode and injected threads
3. To make static analysis of the code more difficult
After some Googling, I found out that this code was originally created by Stephen Fewer of Harmony Security in June of 2009 and described in his blog post. It was intended for use in Win32 shellcode, and it's no surprise that malware authors are now using it.
Of course, I didn't bother with the Googling until after I fully analyzed the function... I could have saved some time if I just searched for 0x6A4ABC5B like a good malware analyst generally does when finding a function with a hardcoded value... Lesson learned!
Other than Zeus, this code is also used in the Win32.Annunaki virus according to this Chinese analysis from ByteHero.com
Subscribe to:
Posts (Atom)