Reverse engineering tool for virtualization wrappers
The Virtual Deobfuscator was developed as part of the DARPA Cyber Fast Track program. The goal was to create a tool that could remove virtual machine (VM) based protections from malware. I developed a prototype version that looks very promising. Virtual machine protections are a relatively new form of obfuscation. They work by translating sections of a binary’s original machine code into bytecode for a custom VM. This transformation is destructive — the original binary is lost. The VM itself is embedded in the protected binary. It is used at runtime to interpret the instructions that were converted to bytecode. The goal of the Virtual Deobfuscator is to analyze a runtrace and filter out the VM processing instructions, leaving a reverse engineer with a bytecode version of the original binary. It doesn’t need to be tailored to the particular VM being analyzed, and so far it’s worked on all the VM interpreters I have tested it on.
First make an output directory; call it output
and make it in the main
VirtualDeobfuscator source directory (neither of these are requirements). Chdir
into it and run the following command:
python ../VirtualDeobfuscator.py -i ../example/olly_loop_eax.txt -d 1 -t verify.txt
Three files will be generated: vd.xml
, vd_IR.txt
, and verify.txt
. The
first one is the converted trace database; this is what gets used for
clustering. Perform clustering with the command:
python ../VirtualDeobfuscator.py -c -d 1
This command will generate a lot of files. The one you really care about is
called final_assembly.txt
. You can find out more about the contents of this
file, as well as more details about the Virtual Deobfuscator, in
doc/WhitePaper.docx
.
The Virtual Deobfuscator is based on pattern matching. It will analyze a runtrace and match patterns of instructions called clusters. This process continues recursively until no more instructions or clusters can be grouped into larger clusters. The remaining unclustered instructions contain the interpreted bytecodes; they are the instructions actually executed by the VM as it processed bytecodes. Since protection VMs generally use RISC-based architectures, their instruction sets are simpler. This means that most instructions from the original program are represented by multiple bytecodes. The post-clustering instruction trace, then, contains a lot more instructions than the original binary did. To clean it up, I run the instructions through a peephole optimizer to remove redundant instructions and get something closer to the original.
The Virtual Deobfuscator’s parser can handle traces from three popular debugging tools: WinDbg, OllyDbg, and Immunity Debugger. It can easily be extended to work with traces generated by other tools. The parser converts traces to a normalized XML format for later processing, so tool developers can also modify their tools output directly to that format. There’s no DTD for our XML format, but it’s extremely simple
The repackaging step uses the output of the clustering process to create a
binary fragment containing the original x86 program code without the VM. This
allows for further analysis in disassemblers such as IDA Pro. I generate the
binary by assembling the “sections” of assembly code created by the
Virtual Deobfuscator. This code is stored in the file
final_assembly_nasm.asm
. I assemble it using the Netwide Assembler (NASM).
Once the runtrace has been reduced to just the bytecode instructions and packaged as a binary, we run the code through a peephole optimizer, implemented as an IDA Pro Python script. This will take care of any remaining redundancy in the code (remember, the bytecodes are for a RISC machine, so there will be redundancy compared to the original CISC instructions). This step also help remove simple obfuscations.
Jason Raber
jason.raber@hexeffect.com
HexEffect, LLC
www.hexeffect.com