Python 2.7 and unicode files
Closed this issue · 2 comments
JaimieMurdock commented
Error when handling a file with Unicode in Python 2.7 via futures.
Traceback (most recent call last):
File "/home/jammurdo/anaconda3/envs/py27/bin/topicexplorer", line 11, in <module>
sys.exit(main())
File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/__main__.py", line 223, in main
args.config_file = benchmark(init.main)(args)
File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/init.py", line 269, in main
sentences=args.sentences, tokenizer=args.tokenizer)
File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/init.py", line 182, in build_corpus
simple=simple, tokenizer=tokenizer)
File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/vsm/extensions/corpusbuilders/corpusstreamers.py", line 70, in corpus_from_files
corpus = [f.result() for f in corpus]
File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, in __get_result
raise exception_type, self._exception, self._traceback
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15615: ordinal not in range(128)
JaimieMurdock commented
vsm.extensions.corpusbuilders.corpusstreamers
was not decoding utf-8 files properly.
JaimieMurdock commented
I've resolved this, will be folded into 1.0b204
and 0.4.10
releases