CNER: Concept and Named Entity Recognition

This is the official repository for CNER: Concept and Named Entity Recognition.

Citation

This work has been published at NAACL 2024 (main conference). If you use any part, please consider citing our paper as follows:

@inproceedings{martinelli-etal-2024-cner,
    title = "{CNER}: Concept and Named Entity Recognition",
    author = "Martinelli, Giuliano  and
      Molfese, Francesco  and
      Tedeschi, Simone  and
      Fern{\'a}ndez-Castro, Alberte  and
      Navigli, Roberto",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.461",
    pages = "8329--8344",
}

Description

This repository contains the evaluation scripts to evaluate CNER models and the official outputs of the CNER system, which can be used to reproduce paper results. We also release:

Our silver training and gold evaluation data on Hugging Face.
A Concept and Named Entity Recognition model trained on CNER-silver on the Hugging Face🤗 Models hub. Specifically, we fine-tuned a pretrained DeBERTa-v3-base for token classification using the default hyperparameters, optimizer and architecture of Hugging Face (see the Tutorial Notebook), therefore the results of this model may differ from the ones presented in the paper.

Setup

Clone the repository:

git clone https://github.com/Babelscape/cner.git

Create a conda environment:
```
conda create -n env-name python==3.9
```
Install the requirements:
```
pip install -r requirements.txt
```

Evaluate CNER models

To evaluate a CNER model, run the following script: python scripts/evaluate.py --predictions_path path_to_predictions

where path_to_predictions is a file with CNER predictions over the CNER-gold dataset split.

Supported formats:

.jsonl

{"sentence_id": "55705165.21", "tokens": ["Commander", ..., "."],  "predictions": ["B-PER", ... , "O"]}

.tsv

Sentence_id Tokens predictions

"55705165.21" ['Commander', 'Donald', 'S.', ... '.'] ['B-PER', 'I-PER', ... 'O']

Sentence_id	Tokens	predictions
"55705165.21"	['Commander', 'Donald', 'S.', ... '.']	['B-PER', 'I-PER', ... 'O']

Reproduce Paper Results

At outputs/cner_output.jsonl you can find the official outputs of our CNER system. To reproduce our CNER results, run the following script: python scripts/evaluate.py --predictions_path outputs/cner_output.jsonl