Baal is an active learning library initially developed at ElementAI (acquired by ServiceNow in 2021).
Our goal is to support both industrial applications and research in active learning.
Read the documentation at https://baal.readthedocs.io.
Our paper can be read on arXiv. It includes tips and tricks to make active learning usable in production.
For a quick introduction to Baal and Bayesian active learning, please see these links:
Installation and requirements
Baal requires Python>=3.7
.
To install Baal using pip: pip install baal
We use Poetry as our package manager.
To install Baal from source: poetry install
Papers using Baal
- Bayesian active learning for production, a systematic study and a reusable library (Atighehchian et al. 2020)
- Synbols: Probing Learning Algorithms with Synthetic Datasets (Lacoste et al. 2020)
- Can Active Learning Preemptively Mitigate Fairness Issues? (Branchaud-Charron et al. 2021)
- Active learning with MaskAL reduces annotation effort for training Mask R-CNN ( Blok et al. 2021)
- Stochastic Batch Acquisition for Deep Active Learning (Kirsch et al. 2022)
What is active learning?
Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points (to understand the concept in more depth, refer to our tutorial).
Baal Framework
At the moment Baal supports the following methods to perform active learning.
- Monte-Carlo Dropout (Gal et al. 2015)
- MCDropConnect (Mobiny et al. 2019)
- Deep ensembles
- Semi-supervised learning
If you want to propose new methods, please submit an issue.
The Monte-Carlo Dropout method is a known approximation for Bayesian neural networks. In this method, the Dropout layer is used both in training and test time. By running the model multiple times whilst randomly dropping weights, we calculate the uncertainty of the prediction using one of the uncertainty measurements in heuristics.py.
The framework consists of four main parts, as demonstrated in the flowchart below:
- ActiveLearningDataset
- Heuristics
- ModelWrapper
- ActiveLearningLoop
To get started, wrap your dataset in our ActiveLearningDataset class. This will ensure
that the dataset is split into
training
and pool
sets. The pool
set represents the portion of the training set which is yet to be labelled.
We provide a lightweight object ModelWrapper similar to keras.Model
to make it easier to
train and test the model. If your model is not ready for active learning, we provide Modules to prepare them.
For example, the MCDropoutModule wrapper changes the existing dropout layer to be used
in both training and inference time and the ModelWrapper
makes the specifies the number of iterations to run at
training and inference.
In conclusion, your script should be similar to this:
dataset = ActiveLearningDataset(your_dataset)
dataset.label_randomly(INITIAL_POOL) # label some data
model = MCDropoutModule(your_model)
model = ModelWrapper(model, your_criterion)
active_loop = ActiveLearningLoop(dataset,
get_probabilities=model.predict_on_dataset,
heuristic=heuristics.BALD(shuffle_prop=0.1),
query_size=NDATA_TO_LABEL)
for al_step in range(N_ALSTEP):
model.train_on_dataset(dataset, optimizer, BATCH_SIZE, use_cuda=use_cuda)
if not active_loop.step():
# We're done!
break
For a complete experiment, we provide experiments/ to understand how to write an active training process. Generally, we use the ActiveLearningLoop provided at src/baal/active/active_loop.py. This class provides functionality to get the predictions on the unlabeled pool after each (few) epoch(s) and sort the next set of data items to be labeled based on the calculated uncertainty of the pool.
Re-run our Experiments
docker build [--target base_baal] -t baal .
docker run --rm baal --gpus all python3 experiments/vgg_mcdropout_cifar10.py
Use Baal for YOUR Experiments
Simply clone the repo, and create your own experiment script similar to the example at experiments/vgg_experiment.py. Make sure to use the four main parts of Baal framework. Happy running experiments
Contributing!
To contribute, see CONTRIBUTING.md.
Who We Are!
"There is passion, yet peace; serenity, yet emotion; chaos, yet order."
The Baal team tests and implements the most recent papers on uncertainty estimation and active learning.
Current maintainers:
How to cite
If you used Baal in one of your project, we would greatly appreciate if you cite this library using this Bibtex:
@misc{atighehchian2019baal,
title={Baal, a bayesian active learning library},
author={Atighehchian, Parmida and Branchaud-Charron, Frederic and Freyberg, Jan and Pardinas, Rafael and Schell, Lorne
and Pearse, George},
year={2022},
howpublished={\url{https://github.com/baal-org/baal/}},
}
Licence
To get information on licence of this API please read LICENCE