UnicodeDecodeErrors with non-ascii hyphen chars
mark-kubacki opened this issue · 2 comments
mark-kubacki commented
chr(173)
is also known as ­
and used as marks in HTML for hyphenation.
import pyphen
dic = pyphen.Pyphen(lang='en_US')
dic.inserted('crocodile', hyphen=chr(173))
… results in UnicodeDecodeError: 'ascii' codec can't decode…
.
mark-kubacki commented
Works if used with u'\xad'
.
liZe commented
There's no real documentation yet, but Pyphen only works with ascii for non-unicode strings. You must use unicode strings when you have non-ascii characters.