Unicode error at line #31 in embeddings.py
sawan16 opened this issue · 3 comments
sawan16 commented
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 0: surrogates not allowed
artetxem commented
This obviously looks like an encoding problem, but I would need more details to know where it happens. Please report the full stack trace.
SouravDutta91 commented
Sometimes 'utf-8' encoding faces errors while encoding/decoding certain symbols or letters. In those cases, you can either try to ignore such errors by adding errors = 'ignore'
with the encoding, or else maybe try some other specific encoding type like latin-1
or ISO-8859-1
for example. Hope this helps.
suman101112 commented
The input embed model is not in correct format. Use model.save_word2vec_format(filename) to save the fasttext or word2vec model.