/Quatrain

Quatrain(Question Answering for Patch Correctness Evaluation), a supervised learning approach that exploits a deep NLP model to classify the relatedness of a bug report with a patch description.

Primary LanguagePythonMIT LicenseMIT

DOI

Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness

@inproceedings{tian2022change,
  title={Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness},
  author={Tian, Haoye and Tang, Xunzhu and Habib, Andrew and Wang, Shangwen and Liu, Kui and Xia, Xin and Klein, Jacques and Bissyand{\'E}, Tegawend{\'E} F},
  booktitle={37th IEEE/ACM International Conference on Automated Software Engineering},
  pages={1--13},
  url = {https://doi.org/10.1145/3551349.3556914}, 
  doi = {10.1145/3551349.3556914},
  year={2022}
}

Paper Link: https://dl.acm.org/doi/abs/10.1145/3551349.3556914

Quatrain

Quatrain (Question Answering for Patch Correctness Evaluation), a supervised learning approach that exploits a deep NLP model to classify the relatedness of a bug report with a patch description.

Catalogue of Repository

artifact_detection_model: a model to detect codes in text.
data: processed and structured dataset.
experiment: scripts to obtain experimental results of paper. 
figure: saved figures for experiment
preprocess: scripts to extract bug reports and commit messages.
representation: embeddings representation model.
utils: scripts to deduplicate dataset.
---------------
INSTALL.md: installation instructions.
quatrain_model.h5: pre-trained QUATRAIN model for users' custom prediction.
requirements.txt: required dependencies.
run.py: entrance to conduct experiment.

Ⅰ) Dataset

  1. bug report summary: title for bug issue.
  2. bug report description: detailed description for bug issue.
  3. patch description: CodeTrans-generated commit message for patch.

A) Table 1: Datasets of labelled patches.

  • data/bugreport_patch.txt: 9135 (1591:7544) Pairs of Bug report & Commit message. Structured as bug-id $$ bug report summary $$ bug report description $$ patchId $$ patch description $$ label
  • data/bugreport_patch_json_bert.pickle: Bert embeddings of Pairs of Bug report & Commit message.
  • data/bugreport_patch_array_bert.pickle: Bert embeddings of paris for 10-fold cross validation.

B) Colleted elements

data/BugReport: Bug reports texts for Defects4j, Bugsjar, Bears. Structured as `bug-id $$ bug report summary $$ bug report description` in txt file.
data/CommitMessage: Commit messages written by developer or generated by CodeTrans in format of json and pickle. Structured as `bug-id: commit message` in json file.
---------------
BATS_RESULT_0.0.json: the prediction results of BATS with cut-off 0.0 on our dataset. 
BATS_RESULT_0.8.json: the prediction results of BATS with cut-off 0.8 on our dataset.
PATCHSIM_RESULT.json: the prediction results of Patch-Sim on our dataset.
PatchLabelsYe.csv: the original prediction results of ODS.
Bears_testinfo.txt: the stack failure information of test suites for Bears.   
bears_index_dict(inverse).json: dictionary of bug-id and commit-id. 
save_bugreport_patch.py: script to produce data/bugreport_patch.txt.

Ⅱ) Requirements

A) Environment

  • python 3.7 (Anaconda recommended)
  • pip install -r requirements.txt

run sudo apt-get install python3.7-dev first if you don't have python3.7 dev package.

B) Data elements

download ASE2022withTextUnique.zip (need to be unzipped) and ASE_features2_bert.pickle from data in Zenodo, accordingly change the absolute path of these two files in experiment/config.py of this repository as below.

  1. self.path_patch ---> ASE2022withTextUnique. Original dataset with patches text and commit messages text.
  2. self.path_ASE2020_feature ---> ASE_features2_bert.pickle. The feature from Tian et al.'s ASE2020 paper for our RQ3 DL experiment.

Simplified dataset: ASE2022withText.

Ⅲ) Experiment

To obtain the experimental results of our paper, execute run.py with the following parameters:

A) Sec. 2.2 (Hypothesis validation)

  1. Figure 2: Distributions of Euclidean distances between bug and patch descriptions.
python run.py hypothesis

B) Sec. 5.1 (RQ1: Effectiveness of Quatrain)

  1. Figure 5: Distribution of Patches in Train and Test Data.
  2. Table 2: Confusion matrix of Quatrain prediction.
python run.py RQ1
  1. The improved F1: a better F1 score of 0.793 by re-balancing the test data.
python run.py RQ1 balance

C) Sec. 5.2 (RQ2: Analysis of the Impact of Input Quality on Quatrain)

RQ 2.1

  1. Figure 6: Impact of length of patch description to prediction.
python run.py RQ2.1

RQ 2.2

  1. Figure 7: The distribution of probability of patch correctness on original and random bug report.
  2. The dropped +Recall: 22% (241/1073) of developer patches, which were previously predicted as correct, are no longer recalled by Quatrain after they have been associated to a random bug report.
python run.py RQ2.2

RQ 2.3

  1. Figure 8: Impact of distance between generated patch descrip- tion to ground truth on prediction performance.
  2. The dropped +Recall: The metric (+Recall) drops by 37 percentage points to 45% when the developer-written descriptions are replaced with CodeTrans-generated descriptions.
python run.py RQ2.3
  1. The dropped AUC: we evaluated Quatrain in a setting where all developer commit messages were replaced with CodeTrans-generated descriptions: the AUC metric dropped by 11 percentage points to 0.774, confirming our findings.
python run.py RQ2.3 CodeTrans

D) Sec. 5.3 (RQ3: Comparison Against the State of the Art)

Sec. 5.3.1 (Comparing against Static Approaches)

  1. Table 3: Quatrain vs a DL-based patch classifie.
  2. New identification: Among 9135 patches, our approach identifies 7842 patches, of which 2735 patches cannot be identified by Tian et al.'s approach (RF).
python run.py RQ3 DL
  1. Table 4: Quatrain vs BATS.
  2. New identification: 180 out of 345 patches are exclusively identified by Quatrain.
python run.py RQ3 BATS

Sec. 5.3.2 (Comparing against Dynamic Approach)

  1. Table 5: Quatrain vs (execution-based) PATCH-SIM.
  2. New identification: Most of the patches (1856/3149) that we identify are not correctly predicted by PATCH-SIM.
python run.py RQ3 PATCHSIM

E) Sec. 6.1 (Experimental insights)

  1. RF with 10-fold: RandomForest (RF) on the embeddings of the bug report and the patch based on 10-fold cross validation.
  2. RF with 10-group: RandomForest (RF) on the embeddings of the bug report and the patch based on 10-group cross validation.
python run.py insights

Ⅳ) Custom Prediction

To predict the correctness of your custom patches, you are welcome to use the prediction interface.

A) Requirements for BERT

  • BERT model client&server: 24-layer, 1024-hidden, 16-heads, 340M parameters. download it here.
  • Environment for BERT server (different from reproduction)
    • python 3.7
    • pip install tensorflow==1.14
    • pip install bert-serving-client==1.10.0
    • pip install bert-serving-server==1.10.0
    • pip install protobuf==3.20.1
    • Launch BERT server via bert-serving-start -model_dir "Path2BertModel"/wwm_cased_L-24_H-1024_A-16 -num_worker=2 -max_seq_len=360 -port 8190
    • switch the port in BERT_Port in case your port 8190 is occupied.
  • Bug report text: developer-written bug report.
  • Patch description text: generating patch description for your plausible patches with commit message generation tools, e.g. CodeTrans. Github and API.

B) Predict

Let's give it a try!

python run.py predict $bug_report_text $patch_description_text

For instance: python run.py predict 'Missing type-checks for var_args notation' 'check var_args properly'

Ⅴ) Custom Train

To re-train QUATRAIN model on our or other dataset, execute the following steps.

  1. Structure your dataset as data/bugreport_patch.txt.
  2. Obtain Bert embeddings of your dataset via experiment/save_bugreport_dataset_json.py
  3. Accordingly, change self.dataset_json in experiment/config.py
  4. Execute python run.py RQ1

License

Quatrain is distributed under the terms of the MIT License, see LICENSE.