The aim of this assignment is to implement search based test input generation technique for a subset of Python. To achieve this, you need to first instrument the target .py
file, and then generate inputs for functions in the file in the format of pytest unit test cases, in such a way that they achieve the highest statement coverage possible. To measure statement coverage, we will use coverage.py
.
To complete the basic goal of this assignment, your test data generation technique should be able to achieve branch coverage against a Python program written using the following:
- variables of type
int
, as well as arithmetic operators if
statementsfor
andwhile
loops- comparison operators,
<
,>
,<=
,>=
,==
, and!=
- boolean predicates that use
and
,or
, andnot
- function calls within predicates
For the sake of simplicity, we will only target individual functions, so you only need to consider function arguments as inputs. All input variables will be of type int
.
- You should instrument the target program (i.e.,
example1.py
, etc) so that executions with any input will leave enough information for you to compute the fitness function. Additionally, you should identify which branches are in the target program, so that you can aim each branch one by one. Note that it is more elegant to do the instrumentation in memory, instead of writing a instrumented file to the disk. - Once you instrument the target, you can measure fitness for a specific branch whenever you execute the target function with arbitrary input. Then you can connect this to the search algorithm.
Handling the following will give you additional points
- tertiary expressions in predicates, e.g.,
if True if x == 42 else False:
orif 4 if x > 0 else 2 > 3:
- the
in
operators for collections, e.g.,if x in y:
in whichy
is a set or a list - handle the match-case syntax(spec and tutorial), newly introduced in Python 3.10, for branch coverage
First, implement hill climbing algorithm for the task. Name your tool as sbst.py
. Your tool should accept a path to a Python file (.py
) as input, and generate a unit test file that can be executed using PyTest, in the same directory as that of the target Python file. Consider the following example:
$ python sbst.py examples/example1.py
$ cd examples
$ pytest
The generated PyTest test code should be named as: test_
+ [target function name] + .py
. For example, the output for a target file foo.py
should be test_foo.py
. The generated test functions should follow the following naming convention: test_
+ [target function name] + _
+ [index number]. For example, the 3rd PyTest test function for the target function named bar
should be called: test_bar_3
.
If you want to check the coverage of your test cases, use coverage.py
:
$ pip install coverage
$ pip install pytest
$ coverage run --branch -m pytest test_example1.py
$ coverage report
python -m coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------------
example1.py 14 12 8 0 9%
test_example1.py 3 1 0 0 67%
----------------------------------------------------
TOTAL 17 13 8 0 16%
$
Note that some branches may be unreachable. Since code reachability is undecidable, you should bail out once the optimization fails to cover a branch after a fixed number of tries.
Implement one or more additional local search algorithms from below, and show that they are more efficient than hill climbing. Your report should clearly show that the additional algorithm outperforms the hill climbing algorithm you implemented.
- Simulated Annealing
- Alternating Variable Method
- Tabu Search
Write, and commit a simple report detailing your implementations, and include it as a PDF file in the repository. In particular, try to discuss the following point:
- What was your general approach towards instrumentation?
- How did you compute approach level?
- How many fitness evaluations does your algorithm require to achieve the reported coverage?
- Is there anything extra to the assignment that you have done? Bonus points, or even beyond those?