/reverse-php-malware

De-obfuscate and reverse engineer PHP malware

Primary LanguagePHPMIT LicenseMIT

An aid to de-obfuscating PHP malware

Lots of malware that afflicts WordPress, Joomla and other PHP-based web sites is written in PHP. PHP is an interpreted language, so attackers distribute malware as source code. Much of the PHP malware is obfuscated. This (PHP) program does de-obfuscation to aid human understanding of the malware.

If you come across or possess PHP malware and this program doesn't de-obfuscate, email me bediger8@gmail.com I will look into improving this code to handle your malware.

Why is PHP malware obfuscated?

My guess is that attackers obfuscate their PHP code for three reasons:

  1. To evade simple signature or checksum based malware detection.
  2. To attempt to keep website owners from understanding what the malware does.
  3. To keep other malware writers from "stealing" their code, or even understanding it.

I base guess no. 1 on the fact that obfuscation methods change rapidly, sometimes only getting used for a single installation of malware.

I make guess no. 2 because the obfuscation is often just a "visual confusion" thing, rather than any kind of encryption. Having assert() evaluate a single, very long line of PHP isn't going to fool any algorithm, but the human eye might glide right past it.

I base guess no. 3 on the fact that most PHP malware is either embarassling simple, or evolves by wholesale feature addition, even if that feature is a hidden back door, or phone-home-emails. Keeping other inept programmers from understanding the code might give an individual a temporary advantage.

Features

  • Replaces strings obfuscated by Base64 (encoding and decoding), Rot13, URL-encoded, reversed and some forms of compression.
  • It can de-obfuscate strings that are created by composing encoding, decoding, compression and other manipulations.
  • It can replace function names that are obfuscated by indirection (i.e. $function($arg1, $arg2...);), or by tricky use of $GLOBALS
  • It can replace variable names that are obfuscaed by the same indirections.
  • It replaces arguments of functions of special interest (eval(), fopen(), preg_replace(), etc) with de-obfuscated, or otherwise statically determined values.
  • It aggregates concatenated strings, or concatenated mixes of strings and obscuring function calls.
  • It evaluates Array() creations to allow deobfuscating strings made by concatenating array elements.
  • It pretty-prints function body arguments of create_function() invocations, composing names for the anonymous functions created that way, and uses those names to de-obfuscate.

Evaluating Array() calls when creating arrays means that revphp changes its own code on-the-fly. Hopefully this doesn't lead to code injection from malware into revphp, but the possibility is there. For better or for worse, malware uses arrays of strings quite often, so some feature like this is necessary.

Installing

Use composer to retrieve the latest PHP-Parser code:

composer install

After that, everything should be in place.

Usage

Basic usage involves a file full of obfuscated PHP, and stdout:

/wherever/reerse-php-malware/revphp  obfuscated.php > pretty.php

or

/wherever/reerse-php-malware/revphp  -R obfuscated.php > pretty.php

The -R flag causes revphp to examine all variable name and replace those names that are indirected by various techniques.

Command line flag -C causes it to leave comments in the ouput code. Ordinarily it deletes comments, because who can believe comments in malware?

Should you find a function in some malware that deserves to have its arguments decoded, you can add that via a -f flag. For instance, fwrite() calls don't have their arguments de-obfuscated by default. To get revphp to do that:

/wherever/reverse-php-malware/revphp -f fwrite obfuscated.php > cleanedup.php

Occasionally you will want to rename a function (and its calls) in the de-obfuscated code. You can use the -F original=new flag:

/wherever/reverse-php-malware/revphp -F OO_000O__O=htaccess_creator obfuscated.php > cleanedup.php

In the file cleanedup.php, all calls to OO_000O__O() will appear as htaccess_creator(), and the function definition will also appear as function htaccess_creator().

Very rarely, a malware author will put in a unique obfuscating function that's not merely a composition of base64_encode(), gzinflate() and rot13(). In that case, you can edit out the obfuscating function into its own file. revephp can read, evaluate, and use that special function during deobfuscation:

/wherever/reverse-php-malware/revphp -D decoding_function.php obfuscated.php > cleanedup.php

The testing script runtests includes a test of a unique obfuscating function. runtests invokes this:

./revphp -r zork -D tests/zork.php tests/t1_1.php

The PHP functions in file tests/zork.php get read in and evaluated using the -D flag. The -r zork flag causes revphp to examine and replace any obfuscated arguments to invocations of function zork() in the subject PHP, tests/t1_1.php. This is a somewhat confusing example, because tests/zork.php contains the definition of function zork(), and so does tests/t1_1.php. One function zork(), the one in tests/t1_1.php, just get de-obfuscated. The other definition of function zork(), in tests/zork.php gets read in and evaluted by the -D flag. The -r zork flag causes revphp to invoke the read-in-and-evaluated function zork() while revphp is traversing the parse tree of tests/t1_1.php.

This closely mimics a realistic situation, where you might run revphp on some malware PHP. You find that revphp can't decode some key obfuscated strings because the malware PHP has a custom decoding function. You can extract a copy of the custom decoding function into a file, and re-invoke revphp with appropriate -D and -r flags to cause the important strings to be de-obfuscated by the custom decoding function.

Design

revphp is written in PHP, and de-obfuscates PHP, in a kind of philosophical short-circuit.

revphp uses PHP-Parser to create a parse tree from a source file, then traverses the parse tree. It keeps a global symbol table, and local symbol tables, which are created and destroyed on parse-tree-function entrance and exit.

During the traverse of the parse tree, it keeps track of assignments to variables. Any value it can de-obfuscate by base64_decode(), urldecode(), strrev(), gzinflate() and gzuncompress(), it will associate with variable's name in the symbol table. It substitutes de-obfuscated values for obfuscated in the parse tree. Most of the work revphp does is evaluting (if it can) the right hand side of assignment statements. PHP malware tends to use a lot of superfluous variables, and a lot of assignments to and from thosse superfluous variables. Tracking variable contents allows de-obfuscation later.

PHP malware tries to obfuscate function calls, both names of functions, and arguments to functions. When revphp reaches a function call in the parse tree, it tries to de-obfuscate any indirect function names (like $fn()), substituting de-obfuscated for obfuscated function name in the parse tree. If revphp happens upon an instance of a select list of functions (some built-in, or set by -f flag on command line), it examines any arguments and tries to substitute de-obfuscated arguments for obfuscated arguments in the parse tree.

Keeping a global and local symbol table allows revphp to de-obfuscate constructions like this:

<?php
$glorf = 'c3lzdGVt';
$frolg = 'ZWNobyAiSGVsbG8sIHdvcmxkIg==';
// ... lots of code ...
function doBadStuff() {
    $fn = base64_decode($GLOBALS['glorf']);
    $fn(base64_decode($GLOBALS['frolg']));
}

After it has completely traversed the parse tree, revphp uses a PHP-Parser built-in pretty-printer to present the user with a (possibly de-obfuscated) source translation. Any anonymous functions created with create_function() calls have their bodies pretty printed at this time. Pretty-printing malware is about half the way towards understanding it.

The design of PHP-Parser caused me to create class RevPHPNodeVisitor extends PhpParser\NodeVisitorAbstract which contains almost all of the above functionality.

Testing

Directories zoo/ and tests/ contain pieces of PHP that illustrate obfuscations found in PHP malware. Many of the test cases are simplified extracts from malware that earlier versions of revphp had problems de-obfuscating. Invoking runtests will execute revphp against all PHP fragments in zoo/, and check the output against desired/correct outputs in desired/.

runtests also executes more complex scenarios, with code residing in tests/.