🏷 Open Source Data Annotation & Labeling Tools
At ZenML we believe that annotation and labeling workflows are a core part of
the machine learning lifecycle. As an open-source tool, we wanted to highlight
and recognize the variety of tools that are available to help your workflows
become more data-centric. We had three core criteria to decide whether a
particular tool could make it into the list:
The tool has an open-source licence.
The tool is actively maintained.
The tool is functional and fit for purpose.
We welcome contributions to this list, so if you know of a tool that
we've missed or if you've built one yourself, please do create a PR!
🔥 Do you use these tools or do you want to add one to your MLOps stack? At
ZenML, we are looking for design partnerships and collaboration to develop the
integrations and workflows around using annotation within the MLOps lifecycle.
If you'd like to learn more, please join our
Slack and leave us a message!
Multi Modal / Multi Domain
Name
Description
License
Acharya
A Data Centric MLOps tool for your Named Entity Recognition projects
?
Adala
An Autonomous Data (Labeling) Agent framework.
Apache-2
Classifai
A comprehensive open-source data annotation platform
Apache-2
Computer Vision Annotation Tool (CVAT)
A free, online, interactive video and image annotation tool for computer vision
MIT
Data Annotator for Machine Learning (DAML)
An application that helps machine learning teams facilitating the creation and management of annotations
Apache-2
DataGym
Open source annotation and labeling tool for image and video assets
MIT
Diffgram
Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale
ELv2
Hover
Explore and label on a map of raw data. Handles text, audio and images.
MIT
Label Studio
A multi-type data labeling and annotation tool with standardized output format
Apache-2
Pigeon
A simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook
Apache-2
QSL: Quick and Simple Labeler
A quick and simple tool for labeling images, videos and time series data, right from Jupyter
MIT
Shoonya
Platform to Annotate and label data at scale
MIT
Tator
Video analytics web platform
AGPL-3
TornadoAi
A human-in-the-loop machine learning framework
AGPL-3
Universal Data Tool
A web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard
MIT
VGG Image Annotator (VIA)
A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers
BSD-2
VIAME
Video and Image Analytics for Multiple Environments
Custom
Xtreme1
An all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM
Apache-2
Name
Description
License
Annotation Lab
An NLP annotation tool included in spark-nlp
Apache-2
Argilla
A production-ready Python framework for exploring, annotating, and managing data in NLP projects
Apache-2
bulk
Bulk is a quick developer tool to apply some bulk labels
MIT
CoreNLP
A Java suite of core NLP tools
GPL-3
DataQA
Labeling platform for text using weak supervision
GPL-3
doccano
An open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks
MIT
FLAT - FoLiA Linguistic Annotation Tool
A web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation
GPL-3
INCEpTION
A semantic annotation platform offering intelligent annotation assistance and knowledge management
Apache-2
knodle
Knodle (Knowledge-supervised Deep Learning Framework)
Apache-2
Markup
A web-based document annotation tool, powered by GPT-4
Unknown
NER Annotator for Spacy
NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags.
MIT
NPLM
Noisy Partial Label Model(NPLM)
N/A
Potato
An annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing
PolyForm Shield
refinery
The data scientist's open-source choice to scale, assess and maintain natural language data.
Apache-2
Slate
A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python
ISC
SMART
A tool for building labeled training datasets for supervised machine learning tasks in NLP
MIT
SpaCy annotator
Spacy NER annotator using ipywidgets
N/A
Small-Text
Active Learning for Text Classification
MIT
Snorkel
Programmatically Build and Manage Training Data
Apache-2
skweak
skweak: Weak supervision for NLP
MIT
TALEN
A way to do annotations for NER
Custom
Theme
Minimalistic CLI labeling tool for text classification
MIT
YEDDA
A lightweight collaborative text span annotation tool
Apache-2
WeaSEL
WeaSEL: Weakly Supervised End-to-end Learning
Apache-2
Name
Description
License
3D Slicer
Visualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes
BSD
Annotate Lab
Simplifying Image Annotation
MIT
Annotorious
A JavaScript library for image annotation
BSD-3
AnyLabeling
Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything, MobileSAM
GPL-3
autodistill
Images to inference with no labeling (use foundation models to train supervised models)
Apache-2
bbox-visualizer
Make drawing and labeling bounding boxes easy as cake
MIT
Bounding Box Editor
A JavaFX desktop application for creating image-object-annotations with bounding boxes
GPL-3
CATMAID
The Collaborative Annotation Toolkit for Massive Amounts of Image Data
GPL-3
COCO Annotator
A web-based image segmentation tool for object detection, localization, and keypoints
MIT
DeepLabel
A cross-platform desktop image annotation tool for machine learning
MIT
ilastik
Segment, classify, track and count your cells or other experimental data
Custom
ImageTagger
An open source online platform for collaborative image labeling
MIT
imglab
A web based tool to label images for objects that can be used to train dlib or other object detectors
MIT
KNOSSOS
A software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity
GPL-2
labelCloud
A lightweight tool for labeling 3D bounding boxes in point clouds
GPL-3
LabelFlow
An open platform for image labeling
Custom
labelme
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation)
Custom
LabelImg
A graphical image annotation tool and label object bounding boxes in images
MIT
LOST
A flexible web-based framework for semi-automatic image annotation
MIT
Make Sense
A free-to-use online tool for labeling photos
GPL-3
MyVision
Computer vision based ML training data generation tool
GPL-3
OHIF Medical Imaging Viewer
OHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker
MIT
OpenLabeler
An open source desktop application for annotating objects for AI appplications
Apache-2
Pixano
A web-based smart-annotation tool for computer vision applications
CeCILL-C
Scalabel
A web-based visual data annotation tool, supporting both 2D and 3D data labeling
Apache-2
webKnossos
A fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics
AGPL-3
Yolo_Label
GUI for marking bounded boxes of objects in images for training neural network YOLO
MIT
Name
Description
License
DIVE
Media annotation and analysis tools for web and desktop
Apache-2
UltimateLabeling
A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker
MIT
Name
Description
License
aubio
A library for audio and music analysis
GPL-3
audino
Open source audio annotation tool
MIT
Praat
Annotation tool for phonetics analysis
GPL-3
Peaks.js
JavaScript UI component for interacting with audio waveforms
LGPL-3
Wavesurfer.js
Navigable waveform built on Web Audio and Canvas
BSD-3
Name
Description
License
sktime
A framework for machine learning with time series
BSD-3
Name
Description
License
Compose
Automated prediction engineering. Allows you to easily structure prediction problems and generate labels for supervised learning
BSD-3
Encord Active
Toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling
Apache-2
NeuroTrALE
Annotation software for brain mapping, supporting 3D imaging and annotation
BSD-2
OpenCRAVAT
A modular annotation tool for genomic variants
MIT
PatchSorter
An open-source digital pathology tool for histologic object labeling
BSD-3
Personal Cancer Genome Reporter (PCGR)
A stand-alone software package for translation of individual tumor genomes for precision cancer medicine
MIT
Quepid
Gather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm.
Apache-2
Thanks to the creators of
these
other
repositories (and this
one !) for getting us
going down the path of creating our own. I used these efforts to get started in
my survey of the space before adding, updating and pruning as per the
open-source and other criteria specified above.