German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Resources and tools which can be used either off-the-shelf or with minor adjustments and which are currently maintained are primarily chosen for this list. It is deliberately biased in terms of usability and user-friendliness.

Pull requests and suggestions are welcome! See contributing guidelines.

Corpora
Generic resources
Linguistic processing
Semantic analysis
Speech NLP
Machine Translation
Teaching resources and tutorials
More lists

Corpora

General-purpose

Historical

Specialized

Swiss German

Learner and Error Corpora

Word lists

Lists

Data acquisition

Generic resources

Frameworks

Treebanks

Annotation

Standards

Linguistic processing

Tokenization

Stemming

CISTEM

Lemmatization

Morphological analysis

Normalization

CAB
norma

POS-tagging

Syntactical parsing

Named Entity Recognition

Industry/Applications

German Decompounder for Apache Lucene / Apache Solr / Elasticsearch

Evaluation

Evaluation of different NLP toolkits

Semantic analysis

Datasets

Word embeddings and senses

Sentiment analysis datasets / polarity clues

Sentiment detection

GermEval (category to improve)

Discourse

Summarization

Tools and corpora for summarization of German texts

Psycholinguistics

Noun Associations for German

Speech NLP

Machine Translation

Parallel corpora

Teaching resources and tutorials

bubenhofer.com/korpuslinguistik/kurs/
CorpusExplorer v2.0 – Seminartauglich in einem halben Tag
deeplearning4nlp-tutorial
Uni Zürich: Sprachtechnologie in den Digital Humanities – MOOC Youtube & Coursera

hoffart/German-NLP