stanfordnlp/CoreNLP

Wrong POS for "keine": PRON instead of DET

GeorgeS2019 opened this issue · 7 comments

Ich habe keine Übungen gemacht, weil ich keine Lust habe.

Stanza states keine as DET
CoreNLP 4.5.6 (with corresponding 4.5.6 German model) states keine as PRON

The data used to train the Stanza tagger was

ud-treebanks-v2.13/UD_German-GSD/de_gsd-ud-train.conllu

where keine is treated as DET

The CoreNLP tagger has not been retrained since UD 2.4, where the standard was to treat keine as PRON

Retraining taggers with updated data is less of a hassle than the general feature adds you've been requesting, so, we'll put updated data for some of those models on the list

@AngledLuffa

I have tried to connect to @manning through Linkedin regarding CoreNLP 4.5.6 with specific interest on German model 4.5.6

@AngledLuffa

I also have issue with the result of dependency parsing. Hopefully, this will go away when the German POS assignment is correct.

@AngledLuffa
I am comparing the CoreNLP German output through code with that of Stanza.
I understand that CoreNLP run online is no longer running. It will take extra few steps to compare between CoreNLP 4.5.6 and the latest Stanza.

@AngledLuffa

Does german parser in CoreNLP support XPOS? I can ONLY find UPOS

CoreNLP

props.setProperty("annotators", "tokenize, ssplit, mwt, pos, lemma, ner, depparse");

Stanza

https://stanfordnlp.github.io/stanza/pos.html
image