naneau/php-obfuscator

High-Level Structure Documentation

gburtini opened this issue · 8 comments

Really great work so far. I suspect this might come from the Symfony project or something, but I would appreciate some documentation on the high level structure to help me make some larger changes to this project. I realize this is a big ask, so I've started a bit of the work. Here's what I've understood so far (and perhaps a starting point for the documentation):

  • Console\Command provides the command line interface via a very self-explanatory class; offers some configuration (parameterization) that all comes from a Symfony parent class. Importantly, it is expected to hold an Obfuscator which defines the method Obfuscate. This can be considered the basic entry point (as the wrapper code to create an Application is cookiecutter)
  • _Node\Visitor_s are the important tools which decide how to deal with the semantic content of the code. After parsing via PHP Parser, each node is "visited". Each visitor inherits from the abstract class Scrambler (scrambler.php). Each of these visitors define a function enterNode that defines a condition that decides whether to call the scramble function or not based on properties of the node (this probably comes from PhpParser's NodeVisitorAbstract, but needs to be checked)
  • Events are currently cookie cutter wrappers for dealing with files. Seems like a compatibility layer for another library (to actually enter a file). Needs more detail about how it gets integrated (assuming there's ever plausibly going to be non-file events a user wants to interact with).
  • StringScrambler defines the class which actually decides how to scramble a variable. Used by the node visitors.

My main interest in this is modifying it so it can scramble public class functions when obfuscating a "full project" (aka we know all the calls). As far as I can tell, that is going to be a challenge because the node visitor only sees the class itself, not the other files that may call it... that said, the function scrambling must already do this, so it seems to not be impossible?

THIS THREAD DISCUSSES TWO SEPARATE ISSUES: documentation of the code "structure" and the implementation of a mechanism for obfuscating non-private methods.

Pinging @basilfx because he has made some large structural changes to his fork of this project (I'm certain he currently has a better understanding of the project's structure than I do!). I would be eager to help merge his fork (it might be very simple) as it has nice tests.

It should be possible to add consistent scrambling by re-using the same hash method (method names should be hashed the same regardless of where they occur). The problem with this is that it massively increases the risks of errors and bugs. As an optional feature it really makes sense though.

My fork extends the capabilities of the obfuscators. As I can live with the fact that I don't have to scramble public/protected methods, I did not add this functionality. Therefore I only explain the Visitors, if you don't mind :-)

All of the logic resides in Node\Visitors. They work per-file. I suspect to look at Obfuscator if you want to have something per-project.

The original implementation has one PhpParser\Traverser: this is the instance that will invoke all visitors. There are four callbacks for each visitor, but they are documented here. An important observation is that the enterNode and leaveNode of all visitors are invoked after each other per node, while the beforeTraverse and afterTraverse are invoked after each other on an array of ALL nodes.

Due to this, I added the concept of multiple passes, because in the first pass I want to traverse the Abstract Syntax Tree (AST) to resolve the type information, while the second (and even more if needed) pass can transform the AST. In my particular case, the type information is needed for tracking types (as far as possible) and resolving use-statements.

The current (global) setup of a visitor is to scan the AST in the beforeTraverse method (which calls itself recursively, thus re-implementing a traverser), and to transform the AST in the enterNode method. I have noticed that this is a limitation, for instance in the ScrambleUse. An example where the current version will fail, is this one. Mine, however, will pass. The reason is because the current implementation scans the whole file at once and will redefine a mapping for TheException (see the example) twice, not taking into account namespaces, classes and so forth.

Oh, I only added samples because I haven't found a good way to add tests. This should, however, not be very difficult, but it requires a lot of samples for a lot of different situations.

Thanks guys. The idea of reusing the hash function is probably all I needed to figure out how to do this. Your explanation is very good @basilfx, and I'd encourage combining it with my partial documentation and sticking it in the Wiki or README.MD somewhere.

An easy (?) way to transform samples in to at least end-to-end tests would be to define a bunch of samples that have known expected output (perhaps make them all output "PHP OBFUSCATOR" or something). Then, a short script which tests each file before and after obfuscation and ensures the output matches and there are no errors.

In this project more than many others, it may be important for that script to have access to multiple versions of PHP.

I have confirmed that it is pretty simple to implement this change (but prone to bugs). For example, as the most naive implementation, changing the scanMethodDefinitions check to obfuscate all ClassMethods and FunctionNodes regardless of MODIFIER_PRIVATE and then removing the isLocal check from enterNode in the same Scramble[Private]Method class does what I need.

I will work towards adding an appropriate option for this behavior with the understanding that such an option requires you to exclude methods from the obfuscation to create an "access point" to the code.

This isn't sufficiently well tested to merge (and it comes from @basilfx's fork, rather than the @naneau fork) but, here in my hammer-implement branch I have added a configuration parameter called hammer mode that ignores private modifiers and obfuscates everything.

In a merge, it will be important to deal with the weird naming issue; calling everything ScramblePrivate... when it no longer always constrains to private is an issue.

@gburtini if you put up a PR from your branch we can move on it.

The scrambling uses a hashing method, which will create a consistent output for any given input (function name). The problem with it is mostly that PHP allows for dynamic calls, which are hard to detect.