galaxykate/tracery

"a" modifier doesn't account for cases like "useful"

serin-delaunay opened this issue · 4 comments

In this grammar:

{
	"start": "#noun_phrase.a#",
	"noun_phrase": "useful tool like Tracery"
}

The expected output from start would be "a useful tool like Tracery", but instead we get "an useful tool like Tracery". I haven't tried Tracery 2, but looking at its altered a function I think the problem would remain (since the third letter isn't i).

The most reliable function I've found to do this job is in inflect.py, although its regex usage isn't especially readable.

Regex could be expanded to use an "a" instead of an "an" for any /use.*/g match.

The issue is slightly wider than the use prefix; "usable", "usurping", "usual", "uninformed", "unintelligent", "uninspected", "uninteresting" are all (probably, I'm on my phone) unaccounted for in the present scheme.

The rule to check if first letter is vowel or not is wrong but cover most of cases as mentioned in the Stack Exchange question. Choosing between a and an depend on the pronunciation rather that the spelling.

a house
a unique
a US dollar
an FBI agent

Basically you have to implement your own language modifier if you want to have specific language features in your app.

You can't cover the complexity of the English language rules in just a few lines of code, the modifier in this repo is just a template.

I forked the project and I am working on extracting the modifiers stuff to let people build their own. See https://github.com/mycaule/epures/tree/master/modifiers

The methodology should be to write lots of unit tests to make sure the language rules you want --but sadly not every rules-- are covered by the code.

In JS language rules are implemented in the library natural for example. See "stemmers"
https://github.com/NaturalNode/natural/tree/master/lib/natural/stemmers