/genprog-tse-2012-bugs

A collection of the bugs used in the TSE 2012 paper on GenProg.

Primary LanguageC

GenProg TSE 2012 Benchmark

This repository provides BugZoo-compatible versions of the majority of the benchmarks used in the TSE 2012 GenProg paper.


Author: Claire Le Goues

Contact: claire.legoues@gmail.com

Original: July 6, 2010

Last Revision: March 20, 2012


This README contains a trail-map for the reproduction of the genprog automatic program repair experiments for benchmarks evaluated in our TSE 2012 publication. TSE 2012 introduced 4 new benchmarks compared to the previous publications: openldap, ccrypt, php, and wu-ftpd. This tarball contains all benchmarks evaluated in TSE 2012, including those previously introduced.

The included bug scenarios assume that you are using GenProg v1.0, a.k.a. the "original" "modify" version of GenProg. More recent versions exist and are described elsewhere. The instructions and scripts here can be adapted for the newer versions (e.g., GenProg v2.0, a.k.a. "repair"), though we leave this to you. There are likely to be slight differences in results between versions (as genetic programming is random).

Refer to the README associated with the source code for documentation fulfilling dependencies and compiling modify, and for a detailed explanation of how to use modify. This README is specific to the benchmark scenarios.

Caveat 0: I have checked all READMEs to the best of my ability and generated a repair for each benchmark at least once on some architecture/OS, probably some flavor of Linux. Some benchmarks require 32-bit architectures; some have only been successfully replicated on particular distros. These are noted where they are known. In general the reproducibility of these experiments varies by OS and architecture.

Caveat 1: You should regenerate intermediate files (preprocessed source files, .ht, .ast, path files) used/generated by the repair process. I have included example copies of such files for each benchmark, but they are unlikely to be portable to your machine.

Caveat 2: GenProg operates on pre-processed code, meaning that the generated patches are sometimes non-obvious. Compare pre-processed code to original code to get a handle on what is going on.

The folders in this package are named after the program they contain. Each includes a README. In general, they also include:

Source code: Original source code, both pre- and post-processing. If modify runs on the combined source of the benchmark, the folder will also include prog_comb.c (the result of combining source using cilly).

prog-coverage.c/prog-cov.c (some version of the code with some use/abbreviation of the word "coverage" in its name), prog.c.path, prog.c.goodpath, described above.

test-prog.sh: A script that performs 100 trials of modify on the benchmark with appropriate parameters. A "trial" is actually two runs of modify with two sets of parameters as explained in the ICSE 2009 paper; if the first succeeds the second does not run. You must set the PATH_TO_MODIFY. In most (but not all) cases, modify will not succeed every time.

test-good.sh, test-bad.sh, expected test case outputs

prog.c-best.c: a repaired variant, not minimized, and not necessarily the only/best fix.

minimized.c, minimized-baseline.c: input/output to the minimization process. minimized-baseline.c is a version of prog.c-best.c (not necessarily the one in the same directory), minimized.c is the minimized repaired variant. Their diff is a repair/patch. This is not necessarily the patch found in the same directory. The patch must be applied to the minimized-baseline.c version of the program.

prog.patch: a minimized patch. Not necessarily the only possible repair, nor necessarily the patch from the minimized.c/minimzed-baseline.c in the folder, nor necessarily the patch from the *-best.c in the folder. prog.patch must be applied to the minimized-baseline.c version of the program.

A reference .debug file

Confused? Check out the README and test-prog.sh in any directory, and that'll probably get you started.

NOTE: It is not the case that every "random" run of modify run leads to a fix. However, any program included in this package can be fixed by some run of modify, so if you're getting nothing, something is wrong. If you get an "assert failed" it means that every variant in the first generation had a "fitness failure", which probably means means either that they're not compiling or that your scripts aren't running (check the permissions).