AIBugHunter Replication Package
First of all, clone this repository to your local machine and access the main dir via the following command:
git clone https://github.com/awsm-research/AIBugHunter.git
cd AIBugHunter
Then, install the python dependencies via the following command:
pip install numpy
pip install torch
pip install transformers
pip install tqdm
pip install pandas
pip install scikit-learn
pip install argparse
pip install gdown
Alternatively, we provide requirements.txt with version of packages specified to ensure the reproducibility, you may install via the following commands:
pip install -r requirements.txt
If having an issue with the gdown package, try the following commands:
git clone https://github.com/wkentaro/gdown.git
cd gdown
pip install .
cd ..
-
We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.
-
To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.
-
Python 3.9.7 is recommended, which has been fully tested without issues.
We use the Big-Vul dataset provided by Fan et al., for more information about the dataset, please refer to this repository.
We recommend to use GPU with 8 GB up memory for training since BERT architecture is very computing intensive.
Note. If the specified batch size is not suitable for your device, please modify --eval_batch_size and --train_batch_size to fit your GPU memory.
First, download the experimental datasets via the following commands:
cd data
sh download_data.sh
cd ..
Run the following command to retrain:
cd rq1_cwe_id_cls/mo_bert
sh train.sh
Run the following command to run inference using the pre-trained model:
cd rq1_cwe_id_cls/mo_bert/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test.sh
Run the following command to retrain:
cd rq1_cwe_id_cls/codebert_base
sh train.sh
Run the following command to run inference using the pre-trained model:
cd rq1_cwe_id_cls/codebert_base/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test.sh
Run the following command to retrain:
cd rq1_cwe_id_cls/bert_base
sh train.sh
Run the following command to run inference using the pre-trained model:
cd rq1_cwe_id_cls/bert_base/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test.sh
Run the following command to retrain:
cd rq1_cwe_id_cls/bow_rf
python rf_main.py
Run the following command to download the pre-trained model:
cd rq1_cwe_id_cls/bow_rf/saved_models
sh download_model.sh
Run the following command to retrain:
cd rq1_cwe_id_cls/naive_bayes
python naive_bayes_main.py
Run the following command to download the pre-trained model:
cd rq1_cwe_id_cls/naive_bayes/saved_models
sh download_model.sh
Run the following command to retrain:
cd rq1_cwe_id_cls/mo_bert
sh train.sh
Run the following command to run inference using the pre-trained model:
cd rq1_cwe_id_cls/mo_bert/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test.sh
Note. Since our approach is a multi-task learning approach, the model is the same as the one used in RQ1.
Run the following command to retrain:
cd rq2_cwe_type_cls/bert_baseline
sh train_codebert.sh
Run the following command to run inference using the pre-trained model:
cd rq2_cwe_type_cls/bert_baseline/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test_codebert.sh
Run the following command to retrain:
cd rq2_cwe_type_cls/bert_baseline
sh train_bert_base.sh
Run the following command to run inference using the pre-trained model:
cd rq2_cwe_type_cls/bert_baseline/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test_bert_base.sh
Run the following command to retrain:
cd rq2_cwe_type_cls/bow_rf
python rf_main.py
Run the following command to download the pre-trained model:
cd rq2_cwe_type_cls/bow_rf/saved_models
sh download_model.sh
Run the following command to retrain:
cd rq2_cwe_type_cls/naive_bayes
python naive_bayes_main.py
Run the following command to download the pre-trained model:
cd rq2_cwe_type_cls/naive_bayes/saved_models
sh download_model.sh
Run the following command to retrain:
cd rq3_cvss_score_reg/bert
sh train_codebert.sh
Run the following command to run inference using the pre-trained model:
cd rq3_cvss_score_reg/bert/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test_codebert.sh
Run the following command to retrain:
cd rq3_cvss_score_reg/bert
sh train_bert_base.sh
Run the following command to run inference using the pre-trained model:
cd rq3_cvss_score_reg/bert/saved_models/checkpoint-best-acc
sh download_model.sh
cd ../..
sh test_bert_base.sh
Run the following command to retrain:
cd rq3_cvss_score_reg/bow_rf
python rf_main.py
Run the following command to download the pre-trained model:
cd rq3_cvss_score_reg/bow_rf/saved_models
sh download_model.sh
Run the following command to retrain:
cd rq3_cvss_score_reg/bow_lr
python lr_main.py
Run the following command to download the pre-trained model:
cd rq3_cvss_score_reg/bow_lr/saved_models
sh download_model.sh
- Special thanks to dataset providers of Big-Vul (Fan et al.)
@article{fu2024aibughunter,
title={Aibughunter: A practical tool for predicting, classifying and repairing software vulnerabilities},
author={Fu, Michael and Tantithamthavorn, Chakkrit and Le, Trung and Kume, Yuki and Nguyen, Van and Phung, Dinh and Grundy, John},
journal={Empirical Software Engineering},
volume={29},
number={1},
pages={4},
year={2024},
publisher={Springer}
}