/art-ner-dataset

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"

Primary LanguagePython

Generation of Training Data for Named Entity Recognition of Artworks

Data and pre-trained models from the paper Generation of Training Data for Named Entity Recognition of Artworks published in the Semantic Web Journal 2023 issue.

Data

Pending approval/license by the owner of the corpus.

Models

The models can be downloaded from here

SpaCy

The Spacy pre-trained model 'en_core_web_md' was used a baseline for further training with domain related annotations. The version of Spacy is 3.3.0. Documentation related to the same is available here.

To use the spacy model to annotate a file with texts (see spacy_model/example_file.csv), download the model folder and run the script spacy_model/run_spacy.py as follows

python run_spacy.py model_location example_file.csv

Flair

The Flair model was trained using GloVe (en-glove) and forward and backward Flair Embeddings (news-X). More information on these embedding models can be found in Flair's documentation

In order to run the model with a sentence, the script flair_model/RunNER.py can be executed with the following command

python RunNER.py final-model.pt "This is a sentence"