/crowd-kit

Control the quality of your labeled data with the Python tools you already know.

Primary LanguagePythonOtherNOASSERTION

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

PyPI Version GitHub Tests Codecov Documentation Paper

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use uv to manage the dependencies:

uv venv
uv pip install -e '.[dev,docs,learning]'
uv tool run pre-commit install

We use pytest for testing and a variety of linters, including pre-commit, Black, isort, Flake8, pyupgrade, and nbQA, to simplify code maintenance.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method Status
Majority Vote ✅
One-coin Dawid-Skene ✅
Dawid-Skene ✅
Gold Majority Vote ✅
M-MSR ✅
Wawa ✅
Zero-Based Skill ✅
GLAD ✅
KOS ✅
MACE ✅

Multi-Label Responses

Method Status
Binary Relevance ✅

Textual Responses

Method Status
RASA ✅
HRRASA ✅
ROVER ✅

Image Segmentation

Method Status
Segmentation MV ✅
Segmentation RASA ✅
Segmentation EM ✅

Pairwise Comparisons

Method Status
Bradley-Terry ✅
Noisy Bradley-Terry ✅

Learning from Crowds

Method Status
CrowdLayer ✅
CoNAL ✅

Citation

@article{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2024},
  journal   = {Journal of Open Source Software},
  volume    = {9},
  number    = {96},
  pages     = {6227},
  publisher = {The Open Journal},
  doi       = {10.21105/joss.06227},
  issn      = {2475-9066},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}

Support and Contributions

Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.

License

© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.