jdevera/pylabeador

Erratic behaviour of the letter 'y'

Opened this issue · 2 comments

Hi again,

in the use of this wonderful implementation of your program, I spotted a behavior which someone (depends on who), could consider erratic.

Of course, it is comprehensible that the management of this letter represents a challenge as it arise mostly of foreign terms. Nevertheless, in the cases that I will present here, I believe there is an easy, simple and non-debatable way of syllabify these words, even in the case where the pronunciation is borrowed from other language too.

Note: DRAE stands for Diccionario de la Real Academia Española (some of the terms have an alternative writing adapted to the Spanish language, like bypass (baipás) or curry (curri) ).

First we have the errors raised when the 'y' is in the middle of two consonants:

pylabeador.syllabify('Tytonidae') #Family of birds, could have use Tyto too
Traceback (most recent call last):
...
raise HyphenatorError("Nucleus expects a vowel!", word)
...

pylabeador.syllabify('bypass') #Appears in DRAE
raise HyphenatorError("Nucleus expects a vowel!", word)

pylabeador.syllabify('byroniano') #Appears in DRAE
raise HyphenatorError("Nucleus expects a vowel!", word)

On the other hand we have (this one, I reckon, could use a debate)

pylabeador.syllabify('byte') #Appears in DRAE
raise HyphenatorError("Nucleus expects a vowel!", word)

Yes, but one may think that only one syllable pronounced like 'bait'
(in Spanish), is the correct way to proceed. My personal point of view is that this 'y' should behave like an 'i' and in Spanish, two syllables arises.

Second, its behaviour when proceeded of 'rr':

pylabeador.syllabify('curry') #Appears in DRAE
['cur', 'ry']

Which, I do not know, could be related to...

Third:

pylabeador.syllabify('cónyuge') #Appears in DRAE, very common word
['có', 'nyu', 'ge']

It is interesting that this one, possibly due to the presence of two strong vowels 'oa', works fine since the hyphenation was broken in such a way that it matches a vowel each consonant (d and y):

pylabeador.syllabify('coadyuvar') #Appears in DRAE
['co', 'ad', 'yu', 'var']

I excuse myself for not providing better explanations of the observed behaviour. I hope that this examples could be helpful for you.

Thank you very much for your work,
M.

Wow, nice findings, which I somehow managed to miss notifications for. The handling of the letter Y is a rather flaky topic in the algorithm that I followed. This might actually warrant another attempt from me to contact the people behind that algorithm, who now run an online service that seems to get all these cases right: https://tulengua.es/silabas/

I hope this is not blocking your work, as I will not be able to work on this very soon.

I just checked the online service that you pointed to me. While most of these cases are well handled, I still find that their tool splits ''curry" as cur-ry. As a consequence, an error in the underlying algorithm, and not only in implementation, cannot be discarded.

Don't worry about the time concerning the fixing. I use your wonderful tool as a part for a side-personal-hobby project (non-profit of course) for which I just need the tool to syllabify more o less correctly, I can definitely carry on with this issue. Nevertheless, I like to report errors so that the tool could be improved for future uses, as well to let the authors be aware of possible bugs.

I understand that the 'y' case is very tricky, so thank you again for your work.