statsmaths/cleanNLP

Same token assigned different POS labels

pchest opened this issue · 1 comments

The cnlp_annotate function from the cleanNLP package assigns different tags to the same word: 'president' in the same R session at different times. The two tags in question are 'NN' and 'NNP'.

cleanNLP version: 3.0.4

R version: 4.3.0

Operating System: Pop!_OS 22.04

Do you mean that it's assigning different part of speech tags to the exact same word in a document when you re-run the cnlp_annotate function? Or, do you just mean that it's tagging "president" with different parts of speech in different sentences?

If it's the first, that would be surprising. Do you have a minimal working example with a short fragement you could share and could you indicate which backend your using? If it's the second, that's not surprising at all. A sentence such as "President of France" would likely "president" as NNP (a singual proper noun) and "the president said that..." as a NN (a singular non-proper noun).