This is the official repository for CNER: Concept and Named Entity Recognition.
This work has been published at NAACL 2024 (main conference). If you use any part, please consider citing our paper as follows:
@inproceedings{martinelli-etal-2024-cner,
title = "{CNER}: Concept and Named Entity Recognition",
author = "Martinelli, Giuliano and
Molfese, Francesco and
Tedeschi, Simone and
Fern{\'a}ndez-Castro, Alberte and
Navigli, Roberto",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.461",
pages = "8329--8344",
}
This repository contains the evaluation scripts to evaluate CNER models and the official outputs of the CNER system, which can be used to reproduce paper results. We also release:
- Our silver training and gold evaluation data on Hugging Face.
- A Concept and Named Entity Recognition model trained on CNER-silver on the Hugging Faceš¤ Models hub. Specifically, we fine-tuned a pretrained DeBERTa-v3-base for token classification using the default hyperparameters, optimizer and architecture of Hugging Face (see the Tutorial Notebook), therefore the results of this model may differ from the ones presented in the paper.
- Clone the repository:
git clone https://github.com/Babelscape/cner.git
- Create a conda environment:
conda create -n env-name python==3.9
- Install the requirements:
pip install -r requirements.txt
To evaluate a CNER model, run the following script:
python scripts/evaluate.py --predictions_path path_to_predictions
where path_to_predictions
is a file with CNER predictions over the CNER-gold dataset split.
Supported formats:
-
.jsonl
{"sentence_id": "55705165.21", "tokens": ["Commander", ..., "."], "predictions": ["B-PER", ... , "O"]}
-
.tsv
Sentence_id Tokens predictions "55705165.21" ['Commander', 'Donald', 'S.', ... '.'] ['B-PER', 'I-PER', ... 'O']
At outputs/cner_output.jsonl
you can find the official outputs of our CNER system.
To reproduce our CNER results, run the following script:
python scripts/evaluate.py --predictions_path outputs/cner_output.jsonl