Kozea/Pyphen

UnicodeDecodeErrors with non-ascii hyphen chars

mark-kubacki opened this issue · 2 comments

chr(173) is also known as ­ and used as marks in HTML for hyphenation.

import pyphen

dic = pyphen.Pyphen(lang='en_US')
dic.inserted('crocodile', hyphen=chr(173))

… results in UnicodeDecodeError: 'ascii' codec can't decode….

Works if used with u'\xad'.

liZe commented

There's no real documentation yet, but Pyphen only works with ascii for non-unicode strings. You must use unicode strings when you have non-ascii characters.