PyFinder is an extension of PyExZ3 with added support and automatic test suite generation. It adds the following functionalities, each of which are described in further detail later within the README.
- PyExZ3 test programs with non-hashable program outputs (e.g. lists) no longer throw an error.
- Extended
SymbolicStr
to support string iteration (for char in line
) - Runtime bounding via the
--max-time
or-t
flag (see Usage) - AST rewriting to wrap constant strings with
SymbolicStr
via--rewrite_ast
(see Usage) - Multiple default seed values for symbolic input arguments via
--try-multiple-seeds
(see Usage) - Program driver to run all functions in the file via
--evaluate_all_funcs
(useful for coverage analysis, see Usage) - Automatic generation of test suites from generated test input, via
--generate_test_suite
(see Usage) - Integration with
coverage.py
, a popular coverage analysis tool that can be used to measure branch and statement coverage of Python programs.scripts/coverage.sh
is provided for this integration. - A self-service web portal for program analysis, which incorporates test suite generation, coverage analysis, and symbolic execution tree graph visualization into a single web page. Instructions for setting up the web portal can be found in
pyfinder_portal/setup.txt
.
In addition to adding functional support, we introduce additional test cases (with comment documentation) to better evaluate the strengths and weaknesses of PyExZ3. A suite of new tests for the default Z3 solver can be found under pyfinder_tests/custom_tests
, while a suite of string-based tests for the CVC solver can be found under pyfinder_tests/custom_tests/cvc
.
Finally, we provide a case study in the form of a Python Calculator program, modified for usage with PyExZ3. This can be found in the python_calc
folder, where the main program is calc.py
This code is a substantial rewrite of the NICE project's (http://code.google.com/p/nice-of/) symbolic execution engine for Python, now using the Z3 theorem prover (http://z3.codeplex.com). We have removed the NICE-specific dependences, platform-specific code, and made various improvements, documented below, so it can be used by anyone wanting to experiment with dynamic symbolic execution.
The paper Deconstructing Dynamic Symbolic Execution explains the basic ideas behind dynamic symbolic execution and the architecture of the PyExZ3 tool (as of git tag v1.0). Bruni, Disney and Flanagan wrote about encoding symbolic execution for Python in Python in the same way in their 2008 paper A Peer Architecture for Lightweight Symbolic Execution - they use proxies rather than multiple inheritance for representing symbolic versions of Python types.
In the limit, PyExZ3 attempts to explore all the feasible paths in a Python function by:
- executing the function on a concrete input to trace a path through the control flow of the function;
- symbolic executing the path to determine how its conditions depend on the function's input parameters;
- generating new values for the parameters to drive the function to yet uncovered paths, using Z3.
For small programs without loops or recursion, PyExZ3 may be able to explore all feasible paths.
A novel aspect of the rewrite is to rely solely on Python's operator overloading to accomplish all the interception needed for symbolic execution; no AST rewriting or bytecode instrumentation is required, This significantly improves the robustness and portability of PyExZ3, as well as reducing its size.
- Make sure that you use Python 32-bit (64-bit) if-and-only-if you use the Z3 32-bit (64-bit) binaries. Testing so far has been on Python 3.2.3 and 32-bit.
- Install Python 3.2.3 (https://www.python.org/download/releases/3.2.3/)
- Install the latest "unstable" release of Z3 to directory Z3HOME from http://z3.codeplex.com/releases (click on the "Planned" link on the right to get the latest binaries for all platforms)
- Add Z3HOME\bin to PATH and PYTHONPATH
- MacOS: setup.sh for Homebrew default locations for Python and Z3; see end for MacOS specific instructions
- Optional: -- install GraphViz utilities (http://graphviz.org/) -- Web Portal: see pyfinder_portal/setup.txt
-
python run_tests.py test
should pass all tests -
python pyexz3.py test\FILE.py
to run a single test from the test directory
-
Basic usage: give a Python file
FILE.py
as input. By default,pyexz3
expectsFILE.py
to contain a function namedFILE
where symbolic execution will start:pyexz3 FILE.py
-
Starting function: You can override the default starting function with
--start MAIN
, whereMAIN
is the name of a function inFILE
:- pyexz3
--start=MAIN
FILE.py
- pyexz3
-
Bounding the number of iterations (PyFinder): of the path exploration is essential when analyzing functions with loops and/or recursion. Specify a bound using the
max-iters
flag:- pyexz3
--max-iters=42
FILE.py
- pyexz3
- Bounding the runtime (PyFinder): is another way of limiting the exploration when there are loops and/or recursion.
Add the flag '-t' or
--max-time
followed by a float value to specify the number of seconds to limit program runtime.
For example, if you wanted to bound the runtime to 3 minutes:
-
pyexz3 FILE.py
-t 180
-
AST Rewriting (PyFinder): Input Python programs can be rewritten to automatically wrap string constants into
SymbolicStr
instances. This allows PyFinder to capture symbolic execution in scenarios where the method receiver would otherwise be a non-symbolic string constant. This is a very limited implementation which only considers changes in the input method, and does not handle external method calls (e.g. third party libraries and built-in python functions).- pyexz3 FILE.py
--rewrite_ast
- pyexz3 FILE.py
-
Running all functions (PyFinder): it's possible to run all functions in the specified file (FILE.py) by adding the following flag. This is most useful in combination with test suite generation (see below) and coverage analysis (see Coverage Analysis)
- pyexz3 FILE.py
--evaluate_all_funcs
- pyexz3 FILE.py
-
Multiple Seeds (PyFinder): will attempt multiple symbolic seed values (eg 0, -1, 1, '', 'foo') if exceptions occur:
- pyexz3 FILE.py
--try-multiple-seeds
- pyexz3 FILE.py
-
Test Suite Generation (PyFinder): Test suites can be automatically generated by adding a flag when calling pyexz3.py:
- pyexz3 FILE.py
--generate_test_suite
- pyexz3 FILE.py
-
Arguments to starting function: by default, pyexz3 associates a symbolic integer (with initial value 0) for each parameter of the starting function. Import from
symbolic.args
to get the@concrete
and@symbolic
decorators that let you override the defaults on the starting function:
from symbolic.args import *
@concrete(a=1,b=2)
@symbolic(c=3)
def startingfun(a,b,c,d):
...
The @concrete
decorator declares that a parameter will not be treated symbolically and
provides an initial value for the parameter.
The @symbolic
decorator declares that a parameter will be treated symbolically - the type
of the associated initial value for the argument will be used to determine the proper symbolic
type (if one exists). In the above example, parameters a
and b
are treated concretely
and will have initial values 1
and 2
(for all paths explored), and parameter c
will
be treated as a symbolic integer input with the initial value 3
(its value can change after
first path has been explored). Since parameter d
is not specified, it will be treated as a symbolic
integer input with the initial value 0:
-
Output:
pyexz3
prints the list of generated inputs and corresponding observed return values to standard out; the lists of generated inputs and the corresponding return values are returned by the exploration engine topyexz3
where they can be used for other purposes, as described below. -
Expected result functions are used for testing of
pyexz3
. If theFILE.py
contains a function namedexpected_result
then after path exploration is complete, the list of return values will be compared against the list returned byexpected_result
. More precisely, the two lists are converted into bags and the bags compared for equality. If a function namedexpected_result_set
is present instead, the list are converted into sets and the sets are compared for equality. List equality is too strong a criteria for testing, since small changes to programs can lead to paths being explored in different orders. -
Import behavior: the location of the
FILE.py
is added to the import path so that all imports inFILE.py
relative to that file will work. -
Other options
--graph=DOTFILE
--log=LOGFILE
- Grab yourself a Brew at http://brew.sh/
- Get the newest python or python3 version:
brew install python
- Have the system prefer brew python over system python:
echo export PATH='/usr/local/bin:$PATH' >> ~/.bash_profile
- - Get z3:
brew install homebrew/science/z3
- Clone this repository:
git clone https://github.com/thomasjball/PyExZ3.git
- Set the PATH:
. PyExZ3/setup.sh
(do not run the setup script in a subshell./ PyExZ3/
)
Vagrant is a cross-platform tool to manage virtualized development environments. Vagrant runs on Windows, OS X, and Linux and can manage virtual machines running on VirtualBox, VMware, Docker, and Hyper-V.
- Download Vagrant.
- Install VirtualBox or configure an alternative provider.
- Run
vagrant up
from the PyExZ3 directory. The Vagrantfile in the repository tells Vagrant to download a Debian base image, launch it with the default provider (VirtualBox), and run the scriptvagrant.sh
to provision the machine. - Once the provisioning is done you can SSH into the machine using
vagrant ssh
and PyExZ3 is ready to run. Please note that the provisioning takes a while as Git is compiled from source as Debian's Git is incompatible with CodePlex where Z3 is hosted.
By default PyExZ3 uses the Z3 to solve path predicates. Optionally, the
CVC SMT solver can be enabled with the
--cvc
argument. While the two solvers offer a similar feature set, the integration of CVC differs from Z3 in a number of ways. Most predominately, the CVC integration uses an unbounded rational number
representation for Python numbers, converting to bit vectors only for
bitwise operations. The Z3 integration uses bounded bit vectors for all
numbers. For programs that use any significant number of bitwise
operations, the default Z3-based configuration is strongly recommended.
Additionally, CVC does not support generating models for non-linear
relationships causing a few of the included PyExZ3 test cases to fail
with a LogicException
.
PyFinder can be used with coverage.py
to evaluate the quality of generated test inputs. Because coverage.py is not compatible with Python 3.2.3 (used by PyExZ3), we strongly recommend installing pyenv
which helps manage Python versions. The script scripts/coverage.sh
demonstrates how one can use coverage.py
in combination with PyExZ3 and pyenv
to automatically generate an html report containing coverage analysis for a given input program.
Note that scripts/coverage.sh
specifically relies on Python 3.6.5, as that is the most recent version that coverage.py
is built for at the time of the project.
PyFinder comes packaged with a Django web application which allows users to conduct self-service program analysis. Users may upload an input file and see PyExZ3 output, automatically generated test suite, coverage, and symbolic execution tree. Details for setting up the server locally can be found in pyfinder_portal/setup.txt
.
Due to the Python version issues mentioned in Coverage Analysis, the web application is not deployed live on any hosting services. Additionally, the application is not designed for simultaneous use by multiple users and may exhibit undesired behavior under such usage patterns.
In addition to adding functional support, we introduce additional test cases (with comment documentation) to better evaluate the strengths and weaknesses of PyExZ3. A suite of new tests for the default Z3 solver can be found under pyfinder_tests/custom_tests, while a suite of string-based tests for the CVC solver can be found under pyfinder_tests/custom_tests/cvc.
Finally, we provide a case study in the form of a Python Calculator program, modified for usage with PyExZ3. This can be found in the python_calc folder, where the main program is calc.py. Usage descriptions can be found in the program comments.