UnicodeDecodeError when loading spacy
dterg opened this issue · 9 comments
Whether I use:
from spacy.en import English
nlp = English()
or
import spacy
nlp = spacy.load('en')
I get the error:
nlp = spacy.load('en') return cls(path=path, **overrides) if 'vocab' not in overrides \ lemmatizer = cls.create_lemmatizer(nlp) return Lemmatizer.load(nlp.path) rules = json.load(file_) return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 565: character maps to
Could this be an issue with the encoding since python 2.7 handles encoding differently than 3.x? Although if I recall well, I used spacy on python 2.7 without any issues before.
Thanks. I must have opened the file incorrectly. I need to add tests for different encoding environment variables to my travis config.
Are you using Python 2.7 or Python 3.5?
Python 2.7(.12)
While I fix the bug, try doing
export LC_ALL=en_US.UTF8
Sorry forgot to add I'm running on windows
Ah. I hope you'll continue reporting problems if you have them :). We're running a bit blind on Windows at the moment.
I've gotten the CI to test the null encoding environment now, and I've turned the test from red to green. I'll push the fix to PyPi.
Updated through pip (spacy 1.0.3) but seems like the issue is persisting :/
The fixed version is 1.0.4 — I think you nipped in just ahead of the upload. Try now.
Worked like a charm. Thank you for your time to fix this!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.