/Eevee

An Easy Annotation Tool for Natural Language Processing

Primary LanguageVue

Eevee: An Easy Annotation Tool

Developed by: Axel Sorensen (https://www.linkedin.com/in/axel-sorensen/)

Accompanying paper: https://aclanthology.org/2024.law-1.20/

Installation

There is no need to install, just go to https://axelsorensendev.github.io/Eevee/ and get started!

Usage

A quick introductory video is available on https://www.youtube.com/watch?v=HsOsfckvnQo

Setup page

On the setup page the user can define the annotation environment. Tasks can be configured in the task field allowing the user to specify the input column (for the input text) and output column (for the target task), as well as adding the desired labels. Labels can also be imported automatically from the annotated file (if it already contains annotations), and a default label can be set for empty annotations. For utterance level tasks (i.e.\ classification), the annotation is not saved in a column, but instead stored in a comment above the text, in the form '# intent = inform'. The tool allows to import and export all settings to 'config files', which are configuration files to allow users to create and import predefined tasks. Once a dataset has been imported, the tabularic data field offers a simple overview of the raw data belonging to each utterance. The user can add new columns or remove existing ones to achieve the desired result. Once the data and tasks are ready, the user simply clicks 'Annotate' to start.

Annotation interface

In the annotation interface, we can see an overview of the tasks on the left, and the current utterance with its annotation in the middle of the screen. We also have status buttons on the bottom, and a progress bar on the top. We provide two annotations modes, which can be toggled through the button on the top-right of the utterance. These are the "keyboard mode" and the "search mode". The keyboard mode is the default for tasks with less than 10 labels, and allows for selection of the right label through the number keys on the keyboard. The search mode instead uses a pop-up box, in which the annotator can search for- and select a label. Both interfaces are shown in the screenshots below:

Offline use

Once the Eevee link has been visited once, the tool will be cached in the browser and can be used offline. The tool can also be downloaded from the browser as a Progressive Web Application, allowing it to be used as a native desktop app.

Shortcuts

The main keyboard shortcuts are available in the annotation interface

  • Arrows up/down: navigate between tasks (also used for navigating through labels in 'search mode')
  • Arrows left/right: navigate through data
  • Number keys: when less than 10 labels, they can be used to select labels
  • Spacebar: mark an annotation instance as 'done'
  • Enter: select the highlighted label in the 'search mode' popup
  • Esc: close the 'search mode' popup

Use-cases

Compatability with other services

Datasets library

To use data from the Huggingface datasets library with Eevee, we provide the hf2conll.py script. To use it, the steps are as follows:

  • Find the dataset you would like to add annotation to: https://huggingface.co/datasets
  • Download the data with: python3 scripts/hf2conll.py conll2003
  • Import the data into Eevee, and use the setup-page as explained in Usage

MaChAmp

The data exported from Eevee can directly be used in the MaChAmp toolkit, to easily train and evaluate state-of-the-art models. However, MaChAmp is based on dataset configuration files. These can automatically be generated from the output of Eevee. This can be done with the eevee2machamp script provided in the scripts folder. The script expects an Eevee dataset file and the path to the training data (+optionally development data path), as follows:

python3 scripts/eevee2machamp.py pokemon.json and pokemon.conll

The script will produce a MaChAmp dataset configuration file and the training command.

Citation

If you use the Eevee annotation tool in your projects, please cite us

@inproceedings{sorensen-etal-2024-eevee,
    title = "{EEVEE}: An Easy Annotation Tool for Natural Language Processing",
    author = "Sorensen, Axel  and
      Peng, Siyao  and
      Plank, Barbara  and
      Van Der Goot, Rob",
    editor = "Henning, Sophie  and
      Stede, Manfred",
    booktitle = "Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)",
    month = mar,
    year = "2024",
    address = "St. Julians, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.law-1.20",
    pages = "216--221",
    abstract = "Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets. There is a wide variety of tools available; setting up these tools is however a hindrance. We propose Eevee, an annotation tool focused on simplicity, efficiency, and ease of use. It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or task-specific formats) for annotation. It allows for annotation of multiple tasks on a single dataset and supports four task-types: sequence labeling, span labeling, text classification and seq2seq.",
}