GSoC Project: LibreOffice CI Test Selection with Machine Learning
The goal of this project is to select unit tests based on (patch,test)
pair. Three models (testlabelselect
, testfailure
, testoverall
) are trained to predict unit tests results given a patch on different levels.
The work is based on Mozilla's bugbug and rust-code-analysis.
testlabelselect
model predicts the failing probability of each unit test given the patch.
Fail (Predicted) | Pass (Predicted) | |
---|---|---|
Fail (Actual) | 3860 | 203 |
Pass (Actual) | 191593 | 1109768 |
testfailure
model predicts the overall failing probability of a patch based on patch features only.
Fail (Predicted) | Pass (Predicted) | |
---|---|---|
Fail (Actual) | 614 | 527 |
Pass (Actual) | 2155 | 4863 |
testoverall
model improves upon testfailure
by using testlabelselect
predictions to predict whether a patch will fail any unit test.
Fail (Predicted) | Pass (Predicted) | |
---|---|---|
Fail (Actual) | 810 | 331 |
Pass (Actual) | 2413 | 4605 |
A smart inference is built based on testlabelselect
and testoverall
predictions. By setting a threshold for the number of failed unit tests, 91% of failures can be captured, while reducing computation by 57%.
Fail (Predicted) | Pass (Predicted) | |
---|---|---|
Fail (Actual) | 10617 | 1054 |
Pass (Actual) | 30103 | 39815 |
Currently, the smart inference is integrated into Jenkins to save computation. If a patch is likely to fail any unit test, the sequential fast track will be run because it is assumed that the patch will fail some unit tests and there is no need to run everything. If it is likely to pass, the normal track will be run to ensure code correctness.
testlabelselect
is not directly used to select unit tests because it is not able to capture all failures, about 5% failures will escape and it could cause severe problem.
Install build-essential
and zstd
:
sudo apt install build-essential
sudo apt install zstd
Clone libreoffice:
git clone https://gerrit.libreoffice.org/core libreoffice
Install rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
export PATH="~/.cargo/bin:$PATH"
Install rust-code-analysis:
cargo install rust-code-analysis-cli rust-code-analysis-web
Install conda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Clone libreoffice-ci:
git clone https://github.com/baolef/libreoffice-ci.git
cd libreoffice-ci
Install Python dependencies:
conda env create -f environment.yml
conda activate libreoffice-ci
To extract features for past gerrit pushes, extract data/jenkinsfullstats.csv
from data/jenkinsfullstats.csv.xz
first, and then run:
python dataset/mining.py --path ../libreoffice
To extract all unit tests, extract pushes features data/commits.json
first, and then run:
python dataset/mapping.py
To extract features for unit tests, extract pushes features data/commits.json
and data/tests.json
first, and then run:
python dataset/test_history.py --path data/commits.json
To convert one database format (eg. data/commits.json
) into another (eg. data/commits.pickle.zstd
):
python dataset/convert.py data/commits.json data/commits.pickle.zstd
To train a model (eg. testlabelselect
, testoverall
) after extracting necessary data:
python train.py testlabelselect
python train.py testoverall
Training a model with full dataset may be time and memory consuming, --limit
argument can be used to train a subset:
python train.py testlabelselect --limit 16384
Detailed training scripts are available for ungrouped data scripts/train.sh
and grouped data scripts/train_group.sh
.
To inference a model (eg. testlabelselect
) after training necessary models (eg.testlabelselect
, testoverall
) for a commit hash (eg. a772976f047882918d5386a3ef9226c4aa2aa118
):
python test.py testlabelselect --revision a772976f047882918d5386a3ef9226c4aa2aa118
If a commit hash is not specified, it will perform inference on the last commit.
Detailed inference script is available in scripts/test.sh
.