spencermountain/compromise

"to" is a preposition and not a conjuction

NikhilVerma opened this issue · 2 comments

https://www.dictionary.com/browse/to

I am trying to build a sentence separator which can split a sentence if it has multiple verb or noun conjunctions.

The current approach is to do something like this

	const conjunctionSplit = doc
		.splitOn("#Adverb? #Verb (#Conjunction|,)")
		.splitOn("(#Conjunction|,) #Adverb? #Verb");

However a sentence like "An organisation should make best efforts to protect it's hardware and software." gets parsed as

[
    "An organisation should make best efforts",
    "to protect",
    "it's",
    "hardware and",
    "software."
]

which should be parsed as

[
    "An organisation should make best efforts to protect it's",
    "hardware and",
    "software."
]

My current workaround is to do this:

world.model.one.lexicon.to = "Preposition";

It's awesome that compromise let's me edit the lexicon so easily. But I think it should be updated in the main library as well

hey Nikhil, yep you're right - looks like a mis-tagging by compromise in this case.
I'm happy to check it out for the next release
thanks for the heads-up
cheers

hey, longer answer this time:
the Penn Tagset has a whole new part-of-speech tag for TO, which I think is why it became a Conjunction in the test-set I used, and why we call it a conjunction by default in compromise. I changed it now, and a billion tests failed. This change should probably be in a major release.

Personally, i've never been clear on the difference - 'head and tail' vs 'head to tail'. I'd love to know if you, (or anyone!) has any opinions on this of any strength - they both seem to do the same thing, to me.

gonna punt this for now. Thank you for flagging it to me
cheers