Multi-annotator Machine Learning

This project implements an ecosystem for multi-annotator learning approaches.

Structure

data_collection: scripts to emulate or adjust our data collection, including the annotation campaign via LabelStudio
- label_studio_interfaces: scripts to perform an annotation campaign via Label Studio using the example of the dataset dopanim
  - annotation.xml: code for the annotation interface
  - post-questionnaire.xml: code for the post-questionnaire interface
  - pre-questionnaire.xml: code for the pre-questionnaire interface
- python_scripts: scripts to download task data from iNaturalist and to prepare it for annotation via Label Studio
  - annotation_tasks.py: script to create batches of annotation tasks for the upload to Label Studio
  - download.py: script to download data from iNaturalist
  - preprocessing.py: script to preprocess downloaded data
  - taxons.py: contains taxon names and IDs to be downloaded from iNaturalist
empirical_evaluation: scripts to reproduce or adjust our empirical evaluation, including the benchmark and case studies
- hydra_configs: collection of hydra config files for defining hyperparameters
  - architecture: config group of config files for network architectures
  - classifier: config group of config files for multi-annotator classification approaches
  - data: config group of config files for datasets
  - ssl_model: config group of config files for self-supervised learning models as backbones
  - experiment.yaml: config file to define the architecture(s), dataset, and multi-annotator classification approach for an experiment
- jupyter_notebooks: Jupyter notebooks to analyze results or use cases
  - analyze_collected_data.ipynb: Jupyter notebook to analyze the dataset dopanim
  - annotation_times_active_learning.ipynb: Jupyter notebook to reproduce the use case on annotation times in active learning for the dataset dopanim
  - t_sne_features.ipynb: Jupyter notebook to create the t-SNE plots of self-supervised features for the dataset dopanim
  - tabular_results.ipynb: Jupyter notebook to create the tables of results obtained after executing the experiments for the dataset dopanim
- python_scripts: collection of scripts to perform experimental evaluation
  - perform_experiments.py: script to execute a single experiment for a given configuration
  - write_bash_scripts.py: script to write Bash or Slurm scripts for evaluation
maml: Python package for multi-annotator machine learning consisting of several sub-packages
- architectures: implementations of network architectures for the ground truth and annotator performance models
- classifiers: implementations of multi-annotator machine learning approaches using pytorch_lightning modules
- data: implementations of pytorch data sets with class labels provided by multiple, error-prone annotators
- utils: helper functions, e.g., for visualization
environment.yml: file containing all package details to create a conda environment

Setup of Conda Environment

As a prerequisite, we assume to have a Linux distribution as operating system.

Download a conda version to be installed on your machine.
Setup the environment via

projectpath$ conda env create -f environment.yml

Activate the new environment

projectpath$ conda activate maml

Verify that the maml (multi-annotator machine learning) environment was installed correctly:

projectpath$ conda env list

Data Collection

Based on the example of the dopanim dataset, we provide scripts to download task data from iNaturalist and to annotate this data via Label Studio.

Check the file taxons.py and ajust the taxon IDs and names according to your preferences.
Inspect the parameters of the script download.py and adjust them to your preferences. For example, you can download only one page with 20 observations per class and only 1s per request via

projectpath$ conda activate maml
projectpath$ cd data_collection/python_scripts
projectpath/data_collection/python_scripts$ python download.py --n_pages 1 --per_page 20 --request_time_interval 1

Do not use a too small request_time_interval to satisfy the requirements of the iNaturalist API. 3. Inspect the parameters of the script preprocessing.py and adjust them to your preferences. For example, you can define 1 validation and 1 test sample per class via

projectpath$ conda activate maml
projectpath$ cd data_collection/python_scripts
projectpath/data_collection/python_scripts$ python preprocessing.py --no_of_test_images_per_taxon 1 --no_of_validation_images_per_taxon 1

Inspect the parameters of the script annotation_tasks.py and adjust them to your preferences. For example, you can extract the annotation tasks for batch 0 and 1 via

projectpath$ conda activate maml
projectpath$ cd data_collection/python_scripts
projectpath/data_collection/python_scripts$ python annotation_tasks.py --batches "[0,1]"

The obtained batches can then be uploaded to Label Studio to be manually assigned to certain annotators. Furthermore, you can upload and employ the corresponding interfaces label_studio_interfaces. We refer to the documentation of Label Studio for understanding the exact steps of setting up the annotation platform.

Empirical Evaluation

We provide scripts and Jupyter notebooks to benchmark and visualize multi-annotator machine learning approaches on datasets annotated by multiple error-prone annotators.

Experiments

The Python script for executing a single experiment is perform_experiment.py and the corresponding main config file is evaluation. In this config file, you also need to specify the mlruns_path defining the path, where the results are to be saved via mlflow. Further, you have the option to select the 'gpu' or 'cpu' as accelerator.

Before starting a single experiment or Jupyter notebook, check whether the dataset is already downloaded. For example, if you want to ensure that the dataset dopanim is downloaded, update the download flag in its config file dopanim.yaml.
An experiment can then be started by executing the following commands

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/python_scripts
projectpath/empirical_evaluation/python_scripts$ python perform_experiment.py data=dopanim data.class_definition.variant="full" classifier=majority_vote seed=0

Since there are many different experimental configuration including ten repetitions with different seeds, you can create Bash scripts by following the instructions in write_bash_scripts.py and then execute the following commands

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/python_scripts
projectpath/empirical_evaluation/python_scripts$ python write_bash_scripts.py

There is a bash script for the hyperparameter search, each dataset variant of the benchmark and use cases. For example, executing the benchmark experiments for the variant full via SLURM can be done according to

projectpath$ conda activate maml
projectpath$ sbatch path_to_bash_scripts/dopanim_benchmark_full.sh

Results

Once, an experiment is completed, its associated results can be loaded via mlflow. For getting a tabular presentation of these results, you need to start the Jupyter notebook tabular_results.ipynb and follow its instructions.

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/jupyter_notebooks
projectpath/empirical_evaluation/jupyter_notebooks$ jupyter-notebook tabular_results.ipynb

For reproducing the confusion matrices of the top-label predictions, reliability diagrams of the likelihoods, and histograms of annotation times, you need to start the Jupyter notebook analyze_collected_data.ipynb and follow its instructions.

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/jupyter_notebooks
projectpath/empirical_evaluation/jupyter_notebooks$ jupyter-notebook analyze_collected_data.ipynb

For reproducing the t-SNE plot of the self-supervised features learned by the DINOv2 ViT-S/14, you need to start the Jupyter notebook t_sne_features.ipynb and follow its instructions.

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/jupyter_notebooks
projectpath/empirical_evaluation/jupyter_notebooks$ jupyter-notebook t_sne_features.ipynb

For reproducing the use case study on annotation times in active learning, you need to start the Jupyter notebook annotation_times_active_learning.ipynb and follow its instructions.

projectpath$ conda activate maml
projectpath$ cd empirical_evaluation/jupyter_notebooks
projectpath/empirical_evaluation/jupyter_notebooks$ jupyter-notebook annotation_times_active_learning.ipynb

Trouble Shooting

If you encounter any problems, watch out for any TODO comments, which give hints or instructions to ensure the functionality of the code. If the problems are still not resolved, feel free to create a corresponding GitHub issue or contact us directly via the e-mail marek.herde@uni-kassel.de

Awj2021/multi-annotator-machine-learning