This repository contains sample notebooks for both zero-shot and few-shot scenarios based on CoNLL-03, we did not use OntoNotes and i2b2 dataset because they are not publicly available.
- Step 1: Train an NER model based on some small available labelled data e.g number of sentences, K=50 or 100. You can use the official HuggingFace code for this.
- Step 2: Combine it with "TOKEN is a MASK" approach using the prediction file from Step 1. In the example_few-shot.ipynb. We first trained on 100 sentences based on conll2003_100, and save the prediction file in output directory.
@article{Davody2022TOKENIA,
title={TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models},
author={Ali Davody and David Ifeoluwa Adelani and Thomas Kleinbauer and Dietrich Klakow},
journal={ArXiv},
year={2022},
volume={abs/2206.07841}
}