Unicode

Question

Unicode

mahsash opened this issue 8 years ago · 3 comments

Hi.
I have some problems during training my own model on Persian dataset. It gave me error at the beginning of training phase. My dataset is in UTF-8 format. Does Glample support utf-8? If yes, what else can be the problem? My dataset is in CONLL2003 format.
The Error: "file loader.py", line 43, in update_tag_scheme
'Please check sentence %i:\n%s' % (i, s_str))
Exception: <exception str() failed>
"

Thanks

Answer 1 · 2017-04-11T18:09:47.000Z

You might need to change the encoding scheme in loader.py from 'utf8' to your string encoding format e.g., I used 'latin-1' for Spanish and German.

Answer 2 · 2017-09-04T12:40:53.000Z

Hi Sir i am also having the same issue with English Data set. My data set stanfordSentimentTreebank is encoded in UTF-8 and i am using GoogleNews Pretrained Word embedding that is a .gz file....
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
Kindly guide me as i am stuck with this error.

Answer 3 · 2017-09-04T12:41:55.000Z

@dungtn can you please help me solving the issue?