/TOKEN-is-a-MASK

Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"

Primary LanguageJupyter Notebook

This repository contains sample notebooks for both zero-shot and few-shot scenarios based on CoNLL-03, we did not use OntoNotes and i2b2 dataset because they are not publicly available.

Steps for Few-shot learning with "TOKEN is a MASK"

  • Step 1: Train an NER model based on some small available labelled data e.g number of sentences, K=50 or 100. You can use the official HuggingFace code for this.
  • Step 2: Combine it with "TOKEN is a MASK" approach using the prediction file from Step 1. In the example_few-shot.ipynb. We first trained on 100 sentences based on conll2003_100, and save the prediction file in output directory.

BibTeX entry and citation info

@article{Davody2022TOKENIA,
  title={TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models},
  author={Ali Davody and David Ifeoluwa Adelani and Thomas Kleinbauer and Dietrich Klakow},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.07841}
}