https://github.com/kwz219/NPR-Exec
https://docs.google.com/document/d/1EEheTmFiMcvsgvCtb5BSL4YdlxLiSReweuzqKIshCkE/edit?usp=sharing
pip install bson scipy pymongo h5py javalang nltk torch transformers OpenNMT-py==2.2.0
Processing data into forms that each NPR system needs
Preprocess_RawData.py
Before data preprocessing, you need to prepare your data into a filedir including:
|---data_dir
|---data.ids: each line has a id to identify data samples
|---buggy_lines: each file contains the buggy line of a sample
|---buggy_methods: each file contains the buggy method of a sample
|---buggy_classes: each file contains the buggy class of a sample
|---fix_lines: each file contains the developer patch line of a sample
|---fix_methods: each file contains the developer patch method of a sample
|---fix_classes: each file contains the developer patch class of a sample
|---metas: meta information of data samples
Raw data of NPR4J-Benchmark can be downloaded from this link: https://drive.google.com/drive/folders/1vKyABQbdvH8SuQc23VihB2INj_brrdnv?usp=sharing
To train a NPR system, you can use a simple command like this:
python train.py -model NPR_SYSTEM_NAME -config CONFIG_FILE_PATH
To use a trained NPR system to generate patches, you can use a simple command like this:
python translate.py -model NPR_SYSTEM_NAME -config CONFIG_FILE_PATH
Trained NPR models can be downloaded from this link: https://drive.google.com/drive/folders/18WmVJQwAOmcbudgHK839KYfY98JKVrEH?usp=sharing
SequenceR: 20GB for training, less than 10GB for predicting
Recoder: 40GB for training, 20GB for predicting
CODIT: less than 10GB for training and predicting
Edits: less than 10GB for training and predicting
CoCoNut (singleton mode): less than 10GB for training and predicting
Tufano: less than 10GB for trainging and predicting
CodeT5-ft: 40GB for training, 20GB for predicting
UniXCoder-ft: 40GB for training, 20GB for predicting
##Latest Experiment Results
considering 9 NPR systems: (Edits, Tufano, CoCoNut, CodeBERT-ft, RewardRepair, Recoder, SequenceR, CodeBert-ft, UniXCoder-ft) candidate number: up to 300
manual validation results 1: https://docs.google.com/spreadsheets/d/11oUYyEiMnDfHRONSrB9hY1smXcrroJSN/edit?usp=sharing&ouid=116802316915888919937&rtpof=true&sd=true manual validation results 2: latest_results/additional_result_check.xlsx