jaraco/inflect

Incorrect indefinite article "an" returned when handling uncommon abbreviations

tonywu7 opened this issue · 5 comments

Version

inflect==5.3.0 python==3.9.6

Problem

inflect.engine.a() returns the article "an" instead of "a" for some abbreviations that do not begin with vowel phonemes. Observe:

>>> import inflect
>>> p = inflect.engine()
>>> p.a('JSON code block')
'a JSON code block'
>>> p.a('YAML code block')
'an YAML code block'
>>> p.a('Core ML function')
'an Core ML function'
>>> p.a('TLS connection')
'an TLS connection'
>>> p.a('CSS selector')
'an CSS selector'

Seems to be caused by the A_abbrev regex:

inflect/inflect.py

Lines 1806 to 1813 in 98e19e3

A_abbrev = re.compile(
r"""
(?! FJO | [HLMNS]Y. | RY[EO] | SQU
| ( F[LR]? | [HL] | MN? | N | RH? | S[CHKLMNPTVW]? | X(YL)?) [AEIOU])
[FHLMNRSX][A-Z]
""",
re.VERBOSE,
)

which is missing the ^ start-of-line assertion, and so it matches parts of the abbreviations that would otherwise require "an," even thought they are not at the beginning of the word:

>>> A_abbrev.search('TLS connection')
<re.Match object; span=(1, 3), match='LS'>

After fixing it:

>>> p.a('JSON code block')
'a JSON code block'
>>> p.a('YAML code block')
'a YAML code block'
>>> p.a('Core ML function')
'a Core ML function'
>>> p.a('TLS connection')
'a TLS connection'
>>> p.a('SSL connection')
'an SSL connection'
>>> p.a('RSA algorithm')
'an RSA algorithm'

Seems reasonable. Can you supply a patch?

I've added tonywu7's modification on https://github.com/kimgerdes/inflect (the smallest fork of all times, just a little ^)

@kimgerdes is it okay if I use your fork, and does it have the same license as this repo? I am running into the same issue. Thank you!

sure thing. same license :)

Fixed in 6.0.3