"a" modifier doesn't account for cases like "useful"
serin-delaunay opened this issue · 4 comments
In this grammar:
{
"start": "#noun_phrase.a#",
"noun_phrase": "useful tool like Tracery"
}
The expected output from start
would be "a useful tool like Tracery"
, but instead we get "an useful tool like Tracery"
. I haven't tried Tracery 2, but looking at its altered a
function I think the problem would remain (since the third letter isn't i
).
The most reliable function I've found to do this job is in inflect.py, although its regex usage isn't especially readable.
Regex could be expanded to use an "a" instead of an "an" for any /use.*/g match.
The issue is slightly wider than the use
prefix; "usable", "usurping", "usual", "uninformed", "unintelligent", "uninspected", "uninteresting" are all (probably, I'm on my phone) unaccounted for in the present scheme.
The rule to check if first letter is vowel or not is wrong but cover most of cases as mentioned in the Stack Exchange question. Choosing between a
and an
depend on the pronunciation rather that the spelling.
a house
a unique
a US dollar
an FBI agent
Basically you have to implement your own language modifier if you want to have specific language features in your app.
You can't cover the complexity of the English language rules in just a few lines of code, the modifier in this repo is just a template.
I forked the project and I am working on extracting the modifiers stuff to let people build their own. See https://github.com/mycaule/epures/tree/master/modifiers
The methodology should be to write lots of unit tests to make sure the language rules you want --but sadly not every rules-- are covered by the code.
In JS language rules are implemented in the library natural
for example. See "stemmers"
https://github.com/NaturalNode/natural/tree/master/lib/natural/stemmers