/KreO

Detecting OOP features in C++ binaries.

Primary LanguagePythonMIT LicenseMIT

KreO

This implements our hybrid dynamic-static technique to extract OO features from compiled C++ binaries.

Setup

Most of the components must run on Windows; however, the static analysis component must be run on Linux because ROSE is only supported on Linux. KreO has been tested on a Ubuntu 20.04 host machine with a Windows 10 VirtualBox VM. The host and VM have a shared memory location, which is where KreO is located. The install instructions assume you are running a similar setup; however, any Windows/Linux combination with shared memory should work. If shared memory is not available, you can transfer the relevant files from the pregame (which must run on Linux) to the Windows machine. When reproducing results, all data in ./data is .gitignore'ed except for the files that are generated on Linux so you can transfer them easily after running the pregame on Linux. Run ./data/clean_pregame.sh to remove the data generated by the pregame from the ./data directory.

Windows

The following tools should be installed on the Windows machine.

Python

Python is required on Windows and there are a few Python packages required to run the project.

  1. Install python 3.10. You may prefer pypy for performance.

  2. Create a virtualenv inside of the project directory by running python -m venv .venv. To use a specific python executable (such as pypy), use that as the python executable with which you create the venv.

    If your operating system is running an older version of python, you may wish to use conda instead of virtualenv, because conda makes it easy to install custom python versions.

  3. Activate the virtualenv with .\.venv\Scripts\activate.bat.

  4. Install dependencies by pip3 install -r requirements.txt.

Pintool

Pin is required for dynamic analysis.

  1. Download a recent Pin version from Intel's website. Most of our testing was on Pin 3.25. Make sure you download pin for MSVC.
  2. Install Cygwin. When selecting packages to install, make sure to add make in addition to the default selections.
  3. Install a recent version of Microsoft Visual Studio with C++
  4. Open up the "x86 Native Tools Command Prompt for VS" to build for 32-bit.
  5. Add the Cygwin binary directory to your path. Usually something like set PATH=%PATH%;C:\cygwin64\bin. Make sure to use MSVC linker instead of GNU linker utility.
  6. Set the PIN_ROOT environment variable to inside the extract Pin directory, using forward slashes. For example, set PIN_ROOT=C:/Users/Mark/pin.
  7. Run make from inside the pintool directory of KreO.

Evaluation (optional)

Linux

ROSE

ROSE is required for static analysis. Follow instructions on the ROSE wiki to install ROSE. We recommend (and have tested with) installing ROSE from source. It is critical that --enable-languages=binaries is specified when building ROSE. Also when installing ROSE specify /usr/local for the install directory.

Pregame

In the pregame directory run make to build the pregame. Note that ROSE must be installed before building the pregame.

OOAnalyzer (optional)

To compare KreO to OOAnalyzer, you may want to install OOAnalyzer. This can be done in a container or on your Linux machine. Refer to OOAnalyzer's install instructions for installation information.

Running

There are three steps:

  1. Initial static analysis, where procedure boundaries and other preliminary information is collected.
  2. Dynamic analysis, where the program is actually run.
  3. Final static analysis, where un-executed code paths are analyzed using constant propagation. The raw output from the dynamic analysis step is also be processed.

A few more steps exist when performing evaluation.

Configuration is controlled by a JSON file. All three parts take as their only argument a path to the JSON configuration, which in turn contains paths to other intermediate files created by stages 1 and 2. All configuration data currently resides in ./data/metadata.json.

When running on Windows, you first must run pregame.py on Linux (after running make in the pregame directory), then on Windows run the run_pipeline_evaluation command in cli.py, or just the run_pipeline command. Specifically, run:

python cli.py --test TEST .\data\metadata.json run-pipeline-evaluation

to run the test with key TEST. Also you can run:

python cli.py .\data\metadata.json run-all-pipelines-with-evaluation

to run the pipeline for each specification in ./data/metadata.json

Also make sure PIN_ROOT is set as described in the setup above.

The final output is a JSON file containing OO features in a structured format in ./data.

Running Lego re-implementation vs. KreO

As a baseline, we re-implemented an approach called Lego. Additional static analysis features and other improvements are built on top of this re-implementation. To specify the tool to run, use the analysis_tool JSON key. Valid tools are kreo, lego_plus, and lego. lego is the base Lego re-implementation. lego_plus includes no additional static analysis, but does include some improvements during the rest of analysis. kreo includes the improvements in lego_plus as well as additional static analysis.

Generating LaTeX results

After running the evaluation pipeline, all results should be in the evaluation/results directory. Results should be labeled by project name, with -kreo, -lego, or -ooa appended to the end of each file. For example, the following results are from running the evaluation on libbmp, optparse, ser4cpp, and tinyxml2:

To compile the results into usable LaTeX tables, run python cli.py data\metadata.json generate-result-tables

Reproducing Results

Included in the project is all the data required for reproducing the results for the paper associated with KreO. This includes all the JSON configuration files required for the projects evaluated as well as the executables and PDB files so you don't have to deal with building the projects yourself. All relevant data is in the data directory.

Note that you must download and place the resources directory from tinyxml2 in the project's base directory, as well as xmltest.cpp.

The shell script data/run_all_pregame.sh must be run on Linux first (from the project's base directory). Then, the batch script data/run_all.bat will run all the Windows-side evaluation scripts for all evaluated projects.

After running Lego and KreO on all projects, run the data/run_all_evaluation.bat (from the project's base directory), then run evaluation/results/generate_result_tables.py