Evaluating Off-the-Shelf NLP Tools for German
This repository contains the scripts, dataset, and evaluation results from the paper:
Katrin Ortmann, Adam Roussel, and Stefanie Dipper. 2019. Evaluating Off-the-Shelf NLP Tools for German. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS), 212--222. [pdf] [bib]
Contents
scripts/
- The main scripts which define how the systems are loaded and called (per annotation level):
tokens.py
,pos.py
,morph.py
,lemmas.py
,depparse.py
common.py
Document model und morphology format conversion- Evaluation scripts:
eval_bounds.py
for tokenization andeval_annotations.py
for everything else
- The main scripts which define how the systems are loaded and called (per annotation level):
eval/
- The results of the evaluation are stored here in two
csv
tables:results.csv
for the accuracy evaluation andtiming.csv
for the performance evaluation. - The plots and tables generated by
scripts/analysis.py
are also stored here.
- The results of the evaluation are stored here in two
data/
- Gold standard datasets (
data/gold/
) and system output (data/system/
) - Each system's output is in an appropriately named subdir, and each of these system-specific subdirs will contain one annotated output file per domain
- The directory
txt/
contains the unannotated original plaintext files.
- Gold standard datasets (
Usage
In theory you can use the provided Makefile to run the experiments, but in practice it is a lot of work to install all of these systems individually. We hope to eventually provide a Dockerfile to make running all of the experiments easier.
However, performing the evaluation (make evaluate
), i.e. comparing the system output to the gold standard, and calculating performance statistics (make analysis
) should work, provided you have Numpy, Pandas, Matplotlib, and Seaborn installed.
Results Preview
A more detailed evaluation can be found in the paper cited above.
Related
For more on the available off-the-shelf tools and resources for German NLP, see https://github.com/adbar/German-NLP.
License
The evaluation data is licensed under CC BY-SA 3.0, except for the TED talk sample, which is provided under CC BY–NC–ND 4.0.