PHP Analysis Project

Spring 2014 Members

  • Brian Edmonds
  • Crystal Lo

Introduction

Our software parses PHP source code using HIPHOPVM, php compiler built by Facebook, and generates a CFG for each function of the source code.

Supported Platform

We support Linux platforms that can run HIPHOPVM. **Note: All tests have been with Ubuntu 13.10.

Installation and run

  1. Install a prebuilt package of HIPHOP. Follow the directions at the following link: https://github.com/facebook/hhvm/wiki#installing-pre-built-packages-for-hhvm
  2. Just to be safe, add the path to hhvm as an environment variable
  3. Install Java
  4. Download the repository as a compressed zip file.
  5. Unzip the compressed package and navigate to php2014-master/
  6. run ./cfg_php.sh $1 where $1 is the local path to your php files
  7. Your output .dot files will be in the graphviz directory, showing a CFG for every function in your PHP code

#Sample Output This program works correctly for the function.php and the gpaCalculator.php files found in the sample_output/ directory. You should see two .dot files generated for this php code. We generate a .dot file for each function implemented in the PHP.

What To Expect

  • Our program outputs a folder named Graphviz created within php2014-master directory. This directory contains CFGs for each function represented a .dot file.

Resources

Bytecode Specification for HipHop: https://github.com/facebook/hhvm/blob/master/hphp/doc/bytecode.specification

Java Documentation: http://gatech.github.io/php2014/

###More work needs to be done

  • Global variables are currently ignored. See the VGetG command in the bytecode specification. The compiled bytecode does not provide an argument for variable position. Not sure about what to do with this.
  • PHP custom class definitions are not supported.
  • Working on connecting the seperate CFGs for each function into an interconnected CFG
  • Exception handling is generally unsupported. This includes the usage of the unwinder in the bytecode. The unwinder takes control of the program once an excpetion be thrown

####Code Components When initially parsing the bytedcode we organize the instructions into the following structure:

Function: a php function implemented by the user. CFGs are built for each individual Function object. Function has multiple lines.

Line: a single line in the Php source code. We keep track of the line number within the source code. A line has multiple bytecode instructions.

Instruction: A bytecode intstruction generated by HIPHOP upon comiling the php source.

CFG has BasicBlocks and Edges. List of definitions and Uses within for each CFG.