Kili AutoML

Kili AutoML is a lightweight library to create ML models in a data-centric AI way:

Label on Kili
Train a model with AutoML and evaluate its performance in one line of code
Push predictions to Kili to accelerate the labeling in one line of code
Prioritize labeling on Kili to label the data that will improve your model the most first

Iterate.

Once you are satisfied with the performance, in one line of code, serve the model and monitor the performance keeping a human in the loop with Kili.

Only works on Linux and Mac OS X

Quickstart

You can try automl on a simple image classification project with this notebook.

Check the Demos section for more examples.

Installation

Creating a new conda or virtualenv before cloning is recommended because we install a lot of packages:

conda create --name automl python=3.7
conda activate automl

git clone https://github.com/kili-technology/automl.git
cd automl
git submodule update --init

then install the requirements:

export SETUPTOOLS_ENABLE_FEATURES="legacy-editable"
pip install torch && pip install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)

Usage

We made Kili AutoML very simple to use. The following sections detail how to call the main methods.

Train a model

We train the model with the following command line:

kiliautoml train \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

By default, the library uses Weights and Biases to track the training and the quality of the predictions. The model is then stored in the cache of the AutoML library (default location: HOME/.cache/kili/automl, but you can choose the location with the env variable KILIAUTOML_CACHE). Kili automl training does the following:

Selects the models related to the tasks declared in the project ontology.
Retrieve Kili's asset data and convert it into the input format for each model.
Fine-tunes the model on the input data.
Outputs the model metrics.

You can check the supported ML backends and the tasks they are used for here.

Compute model loss to infer when you can stop labeling.

Push predictions to Kili

Once trained, the models are used to predict the labels, add preannotations on the assets that have not yet been labeled by the annotators. The annotators can then validate or correct the preannotations in the Kili user interface.

kiliautoml predict \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Using trained models to push pre-annotations onto unlabeled assets typically speeds up labeling by 10%.

You can also use a model coming from another project, if they have the same ontology:

kiliautoml predict \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID \
    --from-project $ANOTHER_KILI_PROJECT_ID

Prioritize labeling on Kili

Once roughly 10 percent of the assets in a project have been labeled, it is possible to prioritize the remaining assets to be labeled on the project in order to prioritize the assets that will best improve the performance of the model.

kiliautoml prioritize \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

This command will change the priority queue of the assets to be labeled. To do this, AutoML uses a mix between diversity sampling and uncertainty sampling.

Label errors on Kili

Note: for image classification, object detection and image segmentation projects only.

Labeling mistakes happen. Fortunately, we provide methods to detect potential annotation problems. label_errors.py allows to identify potential problems and create a 'potential_label_error' filter on the project's asset exploration view:

kiliautoml label_errors \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

ML Tasks

AutoML currently supports the following tasks:

Natural Language Processing (NLP)
- Named Entity Recognition
- Text Classification
Image
- Object detection
- Image Classification
- Semantic Segmentation

Demos

You can test the features of AutoML with these notebooks:

Natural Language Processing (NLP)
- Text Classification
- Named Entity Recognition
Image

Disclaimer

AutoML is a utility library that trains and serves models. It is your responsibility to determine whether the model performance is high enough or not.

Don't hesitate to contribute!