/Cerebro

Cerebro: Static Subsuming Mutant Selection, IEEE Transactions on Software Engineering (TSE)

Apache License 2.0Apache-2.0

Cerebro: Static Subsuming Mutant Selection

This repo contains the code, data set and trained models for the paper Cerebro: Static Subsuming Mutant Selection, published in IEEE Transactions on Software Engineering (TSE).

The paper is available here: Paper

The bib entry for citing the paper is available here: Cite

The dataset is composed of the following:

  1. Codebase gathered for the 48 GNU Coreutils [1] programs in C language and 10 projects in Java from Apache Commons Proper [2], Joda-Time [3], and Jsoup [4];

  2. Mutant infomation in json file format for every program/project with Mutant ID, Source Code File Name, Mutation Type, and Line #;

  3. Subsuming Mutant Label information in json file format with mapping to every mutant on ID basis for every program/project;

  4. Abstracted Code for every original source code file and mutant for every program/project; and

  5. Mutant Annotation Sequences in pairs of lhs (input) and rhs (expected output) for all mutants in every project/program, with mappings between Sequence File Indexes and Mutant IDs, and Sequences and Original Code File Indexes.


Tools/dependencies that we require before executing the code:

  1. Apache Maven ( available here: https://maven.apache.org/download.cgi )
  2. srcML ( available here: https://www.srcml.org/ )

NOTE: please do not forget to modify below variables in data.java file to specify your desired repository locations and/or dependencies

static String dirDataset = "D:/ag/github/Cerebro/dataset";


Commands to execute:

mvn clean package

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar [arguments]

options based on tasks:

to prepare dataset for model training:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep [language] [sequence-length] [abstraction-level]

where,

available options for [language] are c or java

[sequence-length] is the desired number of tokens in a sequence (numeric value) e.g. 25 / 50 / 100

available options for [abstraction-level] are full and partial

so, to create dataset for projects in java, of sequence length 100 with abstraction, below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep java 100 full

to create dataset for projects in c, of sequence length 50 with no abstraction (only code comments removed), below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep c 50 partial


to test the performance of model by evaluating the model generated sequences:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar test [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.


to generate XMLs for input in simulation:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar combinetosimulate [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.


Where to find trained models in the repo?

the trained models are available as below:

dataset/subsuming-mutant-prediction-[language]/smp/smp-[language]-[sequence-length]-[fold#]/model

e.g. model trained for java projects with abstracted sequences of length 100 is available below:

dataset/subsuming-mutant-prediction-java/smp/smp-java-100-01/model


Tools/dependencies that we require to train/test the models:

  1. seq2seq ( available here: https://google.github.io/seq2seq/getting_started/#download-setup )
  2. Tkinter (available here: https://docs.python.org/3.8/library/tkinter.html )
  3. TensorFlow ( available here: https://www.tensorflow.org/install/pip )
  4. PyYAML ( available here: https://pyyaml.org/wiki/LibYAML )
  5. Perl (available here: https://www.cpan.org/modules/INSTALL.html )

for model training:

please refer to the script train.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

below is a sample usage for training a model till 10 epochs for projects in java with sequence length 50 having 135,903 training samples:

./train.sh ../smp-java-50-01 1359030 ../smp-java-50-01/model length_51-g-1-2 1 135903 135903 0

please refer to configurations available in directory Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/configs.

for sequence length 25, 50, and 100, please use length_26-g-1-2, length_51-g-1-2, and length_101-g-1-2


for model testing:

please refer to the script test.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

below is a sample usage for using the trained model available at location - (../smp-java-50-01/model) and test set available at location - (../smp-java-50-01/test) to generate sequences in file genrhs-smp-java-50-01.txt:

./test.sh ../smp-java-50-01/test ../smp-java-50-01/model genrhs-smp-java-50-01.txt

note:

please note that few models were larger than 100MB in size, hence they were split in 2 files to be able to check-in. below are those models:

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-01/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-02/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-03/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-04/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-05/model/model.ckpt.data-00000-of-00001

in aforementioned cases, model.ckpt.data-00000-of-00001 was divided in model.ckpt.data-00000-of-00001.001 and model.ckpt.data-00000-of-00001.002


References

[1] GNU Coreutils. https://www.gnu.org/software/coreutils/, (last accessed April 24, 2021).

[2] Apache Commons Proper. https://commons.apache.org, (last accessed April 24, 2021).

[3] Joda-Time. https://github.com/JodaOrg/joda-time/, (last accessed April 24, 2021).

[4] Jsoup. https://github.com/jhy/jsoup, (last accessed April 24, 2021).