Inconsistent metadata for languages
Closed this issue · 1 comments
ad-si commented
import nlp from "compromise/three"
console.dir(
nlp("English, German, French").json(),
{ depth: null }
)
yields
{
text: "English, German, French",
terms: [
{
text: "English",
pre: "",
post: ", ",
tags: [ "Adjective" ],
normal: "english",
index: [ 0, 0 ],
id: "english|08E00000H",
dirty: true,
confidence: 0.6,
chunk: "Noun",
}, {
text: "German",
pre: "",
post: ", ",
tags: [ "Noun", "ProperNoun", "Demonym" ],
normal: "german",
index: [ 0, 1 ],
id: "german|08F000015",
chunk: "Noun",
dirty: true,
}, {
text: "French",
pre: "",
post: "",
tags: [ "Noun", "ProperNoun", "Demonym" ],
normal: "french",
index: [ 0, 2 ],
id: "french|08G00002H",
chunk: "Noun",
dirty: true,
}
],
}
Why does English not have the same tags as the rest?
spencermountain commented
hey Adrian - cool, I didn't know about the depth parameter!
It could be for a few reasons - perhaps the word-order, like 'the english patient' or the statistics about how it's used 'english is hard to learn', or 'the french love wine'.
You can always co-erce it by supplying a lexicon like this:
nlp('english, french', {english:'Demonym'}).debug()
cheers