/DynaPyt

Dynamic analysis framework for Python

Primary LanguagePythonMIT LicenseMIT

DynaPyt: A Dynamic Analysis Framework for Python

DynaPyt is a dynamic analysis framework designed and developed by Aryaz Eghbali and Michael Pradel. The framework provides hooks for a variety of runtime events in multiple layers of abstraction. Users can create arbitrary dynamic analyses by implementing relevant hooks. Beyond observing runtime behavior, DynaPyt also supports manipulation of behavior, e.g., to inject runtime values or modify branching decisions.


Installation

Installation with pip

Run

pip install dynapyt

Installation from source

  1. Download source:
git clone https://github.com/sola-st/DynaPyt.git
  1. Install requirements:
pip install libcst
# or
cd DynaPyt
pip install -r requirements.txt
  1. Install DynaPyt:
cd DynaPyt
pip install .

Implementing an Analysis

An analysis is a subclass of BaseAnalysis. See the analysis folder for examples.

Instrumenting Python Code

Note: DynaPyt instruments code in-place (it keeps a .py.orig for each file it instruments to keep the original code). But for more convenience in analyzing, we suggest to instrument a copy of the code under analysis.
To run the instrumentation on a single file:

python -m dynapyt.instrument.instrument --files <path to Python file> --analysis <analysis class full dotted path>

To run the instrumentation on all files in a directory:

python -m dynapyt.run_instrumentation --directory <path to directory> --analysis <analysis class full dotted path>

Note that instrumented files might not be portable.

Running an Analysis

To run an analysis:

python -m dynapyt.run_analysis --entry <entry file (python)> --analysis <analysis class full dotted path>

Note: The analysis name should either match the analysis name used for the instrumentation, or the analysis should have a subset of hooks used in the instrumentation analysis.

Single command to instrument and run an analysis on a project:

python -m dynapyt.run_all --directory <directory of project> --entry <entry file (python)> --analysis <analysis class full dotted path>

New

You can now run your code directly and the analyses get triggered. This allows for running tests using test frameworks like tox. To run the analysis in this mode:

  1. Instrument your code with the appropriate analysis.
  2. Set the environment variable DYNAPYT_SESSION_ID to a unique ID (e.g. using uuid).
  3. Create a file in your OS's temporary directory named dynapyt_analyses-<unique ID from 2.>.txt and list the analyses you want, one per line.
  4. If you want to collect analysis coverage, set DYNAPYT_COVERAGE environment variable to the coverage directory.
  5. Run your code as you did before.
  6. To merge coverage and outputs run python -m dynapyt.post_run --coverage_dir=<coverage directory> --output_dir=<output directory>. The coverage will be stored in <coverage_dir>/coverage.json and the outputs will be stored in <output_dir>/output.json.

Available Hooks

Check out this auto-generated API reference for available hooks.


Reproducing the Results in the Paper

To reproduce results from the paper, follow these instructions:
We suggest running each experiment in a fresh Python environment.

First, install DynaPyt using the instructions above.

Analyzing the Python Projects (RQ1, RQ2, and RQ4)

Option 1- Run using the experiment.sh script:
First, download the desired project from GitHub (git clone ...), and put it under ./test/PythonRepos/<package name>.
Then, if the project can be installed with just pip install ., ignore this step and move to the next. Otherwise, place a myInstall.sh script in the root directory of the project with all steps required for installing the package.
Finally, run

bash ./experiment.sh <package name> <test directory> <analysis class full dotted path>
# E.g. for "rich" repository located at test/PythonRepos/rich:
bash ./experiment.sh rich test TraceAll

to run the analysis, or

bash ./experiment.sh <package name> <test directory> original

to run the original code (uninstrumented and unanalyzed).

Option 2- Run manually: For each project (for example Textualize/rich):

  1. Clone the repository (or download the zip):
git clone https://github.com/Textualize/rich.git
cd rich
  1. Instrument all files in the project (for original execution, skip this step):
python -m dynapyt.run_instrumentation --directory . --analysis dynapyt.analyses.TraceAll.TraceAll
  1. Add an entry file (run_all_tests.py) that runs all tests into the root directory:
import pytest

pytest.main(['-n', '8', '--import-mode=importlib', './tests'])

Replace './tests' with the path to test files in the project.

  1. Install the package of the project-under-analysis (may need extra steps for other packages):
pip install .
  1. Run tests:
    For running the analysis:
time python -m dynapyt.run_analysis --entry run_all_tests.py --analysis dynapyt.analyses.TraceAll.TraceAll

To replicate the results for the original execution (not instrumented and analyzed through DynaPyt), perform steps 1, 3, and 4. Then run

python run_all_tests.py

Running Other Analyses (RQ3)

DynaPyt Analyses
To run other DynaPyt analyses, use the appropriate name (the class name) for --analysis in both the instrumentation and the analysis scripts.

Python's built-in trace
To run Python's sys.settrace (used as a baseline in the paper):
From the above instructions, follow step 1, skip step 2 and 3, run step 4, and then put the following code in run_all_tests.py:

from sys import settrace
from trc import _trace_opcodes_ # or _trace_assignments_ or _trace_all_
import pytest

settrace(_trace_opcodes_)
pytest.main(['-n', '8', '--import-mode=importlib', './tests'])

Copy src/nativetracer/trc.py beside run_all_tests.py.
Then run

time python run_all_tests.py

Citation

Please refer to DynaPyt via our FSE'22 paper:

@InProceedings{fse2022-DynaPyt,
  author    = {Aryaz Eghbali and Michael Pradel},
  title     = {Dyna{P}yt: A Dynamic Analysis Framework for {P}ython},
  booktitle = {{ESEC/FSE} '22: 30th {ACM} Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year      = {2022},
  publisher = {{ACM}},
}