"to" is a preposition and not a conjuction
NikhilVerma opened this issue · 2 comments
https://www.dictionary.com/browse/to
I am trying to build a sentence separator which can split a sentence if it has multiple verb or noun conjunctions.
The current approach is to do something like this
const conjunctionSplit = doc
.splitOn("#Adverb? #Verb (#Conjunction|,)")
.splitOn("(#Conjunction|,) #Adverb? #Verb");
However a sentence like "An organisation should make best efforts to protect it's hardware and software." gets parsed as
[
"An organisation should make best efforts",
"to protect",
"it's",
"hardware and",
"software."
]
which should be parsed as
[
"An organisation should make best efforts to protect it's",
"hardware and",
"software."
]
My current workaround is to do this:
world.model.one.lexicon.to = "Preposition";
It's awesome that compromise let's me edit the lexicon so easily. But I think it should be updated in the main library as well
hey Nikhil, yep you're right - looks like a mis-tagging by compromise in this case.
I'm happy to check it out for the next release
thanks for the heads-up
cheers
hey, longer answer this time:
the Penn Tagset has a whole new part-of-speech tag for TO, which I think is why it became a Conjunction in the test-set I used, and why we call it a conjunction by default in compromise. I changed it now, and a billion tests failed. This change should probably be in a major release.
Personally, i've never been clear on the difference - 'head and tail' vs 'head to tail'
. I'd love to know if you, (or anyone!) has any opinions on this of any strength - they both seem to do the same thing, to me.
gonna punt this for now. Thank you for flagging it to me
cheers