Aspell output parsing for Swedish
mmetayer opened this issue · 2 comments
Hello,
I encounter an error while parsing the output of Aspell for Swedish sentences with words containing a colon (:
) - in Swedish and Finnish, words can contain colons, see https://en.wikipedia.org/w/index.php?title=Colon_(punctuation)
here is the output of aspell:
$ aspell --lang=sv --encoding=UTF-8 -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
S:t Petersburg är i Ryssland
& S:t 23 0: St, Set, Sot, Söt, Stl, Stå, Sy, Ät, Åt, Est, Ost, Öst, SI, SM, TT, Sa, Se, Sk, So, Så, Ut, Yt, SJ
? Petersburg 0 4: Peters
*
*
*
The issue comes from here: as there is more than one colon in the line, the $parts
contains more than 2 elements, and at line 192, $parts[0][3]
is not set, resulting in a notice, and even a fatal error if strict_types is enabled (PHP Fatal error: Uncaught TypeError: trim() expects parameter 1 to be string, null given
).
I guess a regexp would be more suitable in this case, I can try to do a PR if needed (though I'm not sure how to implement the tests yet...)
Hi @mmetayer thanks for reaching out.
Well this was unexpected. Today I learned that colon
is not a good separator ;)
Would be nice if you try to create a fix for this. I am happy to give some tipps on how to add tests for this case in the PR 👍
Thanks for your reactivity @icanhazstring , I'll try to do a PR as soon as I have time, hopefully this evening 🙂