Simple walktrough for a named entitiy recognition (NER) setup for the German language

Different embeddings for Named Entity Recognition (NER) in German text are benchmarked. As stated in my bachelor thesis, a combination of BERT embeddings and Flair embeddings yields new best performances on the GermEval-14 NER dataset (F1-score of 86.62).

Setup

Flair Experiments: Flair and Google Colab

BERT Experiments: Transformers and Google Colab

Datasets

BIOES and BIO/IOB formats are considered in the evaluation.

The datasets used in my benchmark are Conll-03 and GermEval-14. Additionally, i compared several embeddings on a complaint dataset in my bachelor thesis. Unfortunately, this dataset is not public, but the performance can be found in the thesis.

pascalhuszar/NER-benchmark

Simple walktrough for a named entitiy recognition (NER) setup for the German language

Setup

Datasets