mekras/php-speller

Aspell output parsing for Swedish

mmetayer opened this issue · 2 comments

Hello,

I encounter an error while parsing the output of Aspell for Swedish sentences with words containing a colon (:) - in Swedish and Finnish, words can contain colons, see https://en.wikipedia.org/w/index.php?title=Colon_(punctuation)

here is the output of aspell:

$ aspell --lang=sv --encoding=UTF-8 -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
S:t Petersburg är i Ryssland
& S:t 23 0: St, Set, Sot, Söt, Stl, Stå, Sy, Ät, Åt, Est, Ost, Öst, SI, SM, TT, Sa, Se, Sk, So, Så, Ut, Yt, SJ
? Petersburg 0 4: Peters
*
*
*

The issue comes from here: as there is more than one colon in the line, the $parts contains more than 2 elements, and at line 192, $parts[0][3] is not set, resulting in a notice, and even a fatal error if strict_types is enabled (PHP Fatal error: Uncaught TypeError: trim() expects parameter 1 to be string, null given).

I guess a regexp would be more suitable in this case, I can try to do a PR if needed (though I'm not sure how to implement the tests yet...)

Hi @mmetayer thanks for reaching out.

Well this was unexpected. Today I learned that colon is not a good separator ;)
Would be nice if you try to create a fix for this. I am happy to give some tipps on how to add tests for this case in the PR 👍

Thanks for your reactivity @icanhazstring , I'll try to do a PR as soon as I have time, hopefully this evening 🙂