Frequently when analyzing a new malware sample, I find that some or all of the referenced strings are encrypted in memory - even after the sample is unpacked. Malware authors do this is to make analysis harder. You can't just dump the process memory and run strings on it if the strings are encrypted. It's also harder to figure out what each function does if the string references aren't clear.
A client I was working with wanted to know some key pieces of information about an IRC-controlled found on their network (MD5 d9b2e9d8826cab35e3287a0a35248f40) - what registry keys it uses to autostart (from their own testing they knew it picks a different value every time), and what hosts/ports it connects out to. These values were all stored encrypted and didn't show up in in the unpacked EXE or a dump of the process, and running the sample through auto-analysis tools like Anubis won't provide this information - that's why they called me.
In a previous post, I created an IDA script to decrypt encrypted strings used in the Zeus trojan. That example of decrypting strings was easy because the algorithm was trivial. When the algorithm is more complicated, it can be helpful to load the sample up in the IDA debugger and use the Appcall functionality to decrypt the data.
The Appcall feature was added in IDA 5.6 and lets you call functions in a debugged program. You can treat any function in your disassembly as a built-in IDA function in a script as long as the function has type information in the IDA database. The full Appcall reference has more information.
In this example, there are functions I named XORStringDecrypt and XORStringEncrypt. When the bot wants to access one of the strings, it passes a pointer to that string to XORStringDecrypt, which decrypts the string in-place in memory. After it's used, the code calls XORStringEncrypt to re-encrypt the data. This ensures that the decrypted strings won't show up in a process memory dump. The XORStringDecrypt function itself is trivial, but it does rely on a key that's calculated when the bot starts up:
string_decrypt_key is a 16 byte array gets setup when the bot starts. It's an MD5 of some other data embedded in the EXE file, but with the first byte OR'd with 0x80 to set the high bit to 1. This ensures that the first encrypted byte of any ASCII text will be higher than 0x7F. The decryption routine checks that each string starts with a value higher than 0x7F before decrypting to make sure that the string hasn't been already decrypted.
I could have easily just figured out string_decrypt_key in the debugger then wrote a decryption routine in IDC like I did for the Zeus post, but I tried something different.
This is a table of strings used by the function that sets up the auto-run entry in the registry (under HKLM\Software\Microsoft\Windows\CurrentVersion\Run) - it picks a name for key and a name for the EXE file. The name and key are randomly chosen and each one looks like a real entry that could be used by Windows. Here's what it looks like encrypted:
The script I wrote to decrypt the strings searches the .data section of the EXE for string references that start with an ASCII code higher than 0x7F. When it finds one, it calls XORStringDecrypt on it. To call XORStringDecrypt, the script calls LocByName to get the offset of the function, then uses the Appcall() function to call it:
This script can only be run while debugging the executable, and it will only work correctly after the encryption key is set up. I set a breakpoint immediately after the key is setup and then ran the script. The string data looks much better now:
This is just one of the many interesting possibilities for using IDA's awesome Appcall functionality.