We note that a recent work from NTU also focuses on pre-trained models for code change. Since these works are conducted at similar time, we missed to discuss this paper in our Related Work part. We are sorry for that and we hope readers could also pay attention to this paper.
pytorch=2.0.0;
torchvision=0.15.1;
torchaudio;
datasets==1.16.1;
transformers==4.21.1;
tensorboard==2.12.2;
tree-sitter==0.19.1;
nltk=3.8.1;
scipy=1.10.1;
Install the above requirements manully or execute the following script:
bash scripts/setup.sh
- Download the dataset and models:
bash scripts/download.sh
- Prepare the dataset for pre-training[optional]
bash scripts/prepare_dataset.sh
bash scripts/pre-train.sh -g [GPU_ID]
bash scripts/finetune_msggen.sh -g [GPU_ID] -l [cpp/csharp/java/javascript/python/fira]
The released checkpoint may performs better than stated in the paper. If the evaluation during fine-tuning takes too long, you can adjust the "--evaluate_sample_size" parameter. This parameter refers to the number of cases in the validation set during evaluation.
To evaluate the performance of a specific checkpoint, add the flag "-e" followed by the checkpoint path:
bash scripts/finetune_msggen.sh -g [GPU_ID] -l [cpp/csharp/java/javascript/python/fira] -e [path_to_model]
Note that if [path_to_model] is blank, this script will automatically evaluate our released checkpoint.
bash scripts/finetune_cup.sh -g [GPU_ID]
To evaluate a specific checkpoint like in Task 1, add the flag "-e" followed by the checkpoint path.
Additionally, we have released the the output result of CCT5 and baselines, which is stored at results/CommentUpdate
. Execute the following script and assign the path_to_result_file
to evaluate its effectiveness:
bash scripts/eval_cup_res.sh --filepath [path_to_result_file]
Fine-tune:
bash scripts/finetune_jitdp_SF.sh -g [GPU_ID]
Evaluate:
bash scripts/finetune_jitdp_SF.sh -g [GPU_ID] -e [path_to_model]
Fine-tune:
bash scripts/finetune_jitdp_SF_EF.sh -g [GPU_ID]
Evaluate:
bash scripts/finetune_jitdp_SF_EF.sh -g [GPU_ID] -e [path_to_model]
Fine-tune:
bash scripts/finetune_QE.sh -g [GPU_ID]
Evaluate:
bash scripts/finetune_QE.sh -g [GPU_ID] -e [path_to_model]
Fine-tune:
bash scripts/finetune_CodeReview.sh -g [GPU_ID]
Evaluate:
bash scripts/finetune_CodeReview.sh -g [GPU_ID] -e [path_to_model]
We reused some code from open-source repositories. We would like to extend our gratitude to the following repositories:
@inproceedings{lin2023cct5,
title={CCT5: A Code-Change-Oriented Pre-Trained Model},
author={Lin, Bo and Wang, Shangwen and Liu, Zhongxin and Liu, Yepang and Xia, Xin and Mao, Xiaoguang},
booktitle={Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
year={2023}
}