
Code for the CoNLL2019 paper on NER with Partial Annotations

Primary LanguagePython

NER with Partial Annotations

Code for the CoNLL2019 paper on NER with Partial Annotations. See also the paper in the ACL Anthology.


NOTE: this code uses AllenNLP 0.8.4 and ccg_nlpy. AllenNLP in particular has changed a lot since we wrote this code, so getting the right version is important!

$ pip install ccg_nlpy allennlp==0.8.4

Data & Embeddings

You can see some sample data in data/eng. These files have TextAnnotation format, from ccg_nlpy.

You will need to set paths in utils.py for the embeddings, and the data.

If you want to use BERT instead regular embeddings, change USING_BERT in utils.py to true.

Converting CONLL to TAJSON:

If you have a labeled text file in CONLL format, you can use conll2tajson.py to convert it into a tajson file.

This file converts a folder full of labeled text file in CONLL Format (one token and one label per line) into a folder of tajson files.

$ conll2tajson.py input_folder output_folder


For the main results:

$ python main_ours.py <lang>

For the others, the names should be self-explanatory!


If you use this code, please cite us!

    author = {Stephen Mayhew and Snigdha Chaturvedi and Chen-Tse Tsai and Dan Roth},
    title = {{Named Entity Recognition with Partially Annotated Training Data}},
    booktitle = {Proc. of the Conference on Computational Natural Language Learning (CoNLL)},
    year = {2019},
    url = "https://cogcomp.seas.upenn.edu/papers/MCTR19.pdf",
    funding = {LORELEI},