Incorrect removal of accents when preceded by a capital letter
ViliamVadocz opened this issue · 3 comments
ViliamVadocz commented
>>> import cleantext
>>> cleantext.clean("všetko")
'vsetko'
>>> cleantext.clean("Všetko")
'va!etko'
>>> cleantext.__version__
'0.4.0'
ViliamVadocz commented
unidecode
does not have this issue, so something must be happening before.
>>> import unidecode
>>> unidecode.unidecode("všetko")
'vsetko'
>>> unidecode.unidecode("Všetko")
'Vsetko'
ViliamVadocz commented
Issue seems to be with the fix_bad_unicode
function.
>>> from cleantext import fix_bad_unicode
>>> fix_bad_unicode("všetko")
'všetko'
>>> fix_bad_unicode("Všetko")
'VÅ¡etko'
jfilter commented
Thanks for reporting! This is fixed in the upcoming release.