The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches
@article{tian2022best,
title={The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches},
author={Tian, Haoye and Liu, Kui and Li, Yinghua and Kabor{\'e}, Abdoul Kader and Koyuncu, Anil and Habib, Andrew and Li, Li and Wen, Junhao and Klein, Jacques and Bissyand{\'e}, Tegawend{\'e} F},
journal={ACM Transactions on Software Engineering and Methodology},
url = {https://doi.org/10.1145/3576039},
doi = {10.1145/3576039},
publisher={ACM New York, NY}
}
Paper Link: https://dl.acm.org/doi/abs/10.1145/3576039
A patch correctness predicting framework.
- python 3.7 (Anaconda recommended)
- pip install -r requirements.txt
download PatchCollectingTOSEMYeUnique.zip (need to be unzipped) from data in Zenodo, accordingly change the absolute path of the associated files in config_default.py of this repository as below.
- self.path_dataset ---> PatchCollectingTOSEMYeUnique. The main labeled patches dataset.
- self.wcv in {Bert, CC2Vec, Doc}.
To obtain the experimental results of our paper, go to folder experiment and execute run.py
with the following parameters:
Evaluation of learned and engineered embeddings on six ML classifiers in Leopard.
python main.py experiment cvgroup single xgb
The last argument selected in {dt, lr, nb, rf, xgb}.
B) RQ-4: Combining Learned Embeddings and Engineered Features for more Accurate Classification of Correct Patches.
Comparing results of classifying correct patches with combined feature against the single feature.
python main.py experiment cvgroup combine ensemble_xgb
The last argument selected in {ensemble_rf, naive_rf, ensemble_xgb, naive_xgb, deep_combine}.
SHAP analysis for features combination.
python main.py experiment SHAP
Then, execute SHAP/display.ipynb in Jupyter notebook.
Please refer to Leopard.
- Deduplicating your dataset in self.path_dataset with script.
python main.py deduplicate
- Training Doc2Vec
python main.py train_doc
- Generating ODS feature in json file
python main.py ods_feature
- Saving learned feature and engineered feature into NPY
python main.py save_npy
- Saving feature for test data
python main.py save_npy_4test