explosion/spaCy

UnicodeDecodeError when loading spacy

dterg opened this issue · 9 comments

dterg commented

Whether I use:

from spacy.en import English
nlp = English()

or


import spacy
nlp = spacy.load('en')

I get the error:

nlp = spacy.load('en')
return cls(path=path, **overrides)
if 'vocab' not in overrides \
lemmatizer = cls.create_lemmatizer(nlp)
return Lemmatizer.load(nlp.path)
rules = json.load(file_)
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 565: character maps to

Could this be an issue with the encoding since python 2.7 handles encoding differently than 3.x? Although if I recall well, I used spacy on python 2.7 without any issues before.

Thanks. I must have opened the file incorrectly. I need to add tests for different encoding environment variables to my travis config.

Are you using Python 2.7 or Python 3.5?

dterg commented

Python 2.7(.12)

While I fix the bug, try doing

export LC_ALL=en_US.UTF8
dterg commented

Sorry forgot to add I'm running on windows

Ah. I hope you'll continue reporting problems if you have them :). We're running a bit blind on Windows at the moment.

I've gotten the CI to test the null encoding environment now, and I've turned the test from red to green. I'll push the fix to PyPi.

dterg commented

Updated through pip (spacy 1.0.3) but seems like the issue is persisting :/

The fixed version is 1.0.4 — I think you nipped in just ahead of the upload. Try now.

dterg commented

Worked like a charm. Thank you for your time to fix this!

lock commented

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.