inpho/topic-explorer

Python 2.7 and unicode files

Closed this issue · 2 comments

Error when handling a file with Unicode in Python 2.7 via futures.

Traceback (most recent call last):
  File "/home/jammurdo/anaconda3/envs/py27/bin/topicexplorer", line 11, in <module>
    sys.exit(main())
  File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/__main__.py", line 223, in main
    args.config_file = benchmark(init.main)(args)
  File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/init.py", line 269, in main
    sentences=args.sentences, tokenizer=args.tokenizer)
  File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/topicexplorer/init.py", line 182, in build_corpus
    simple=simple, tokenizer=tokenizer)
  File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/vsm/extensions/corpusbuilders/corpusstreamers.py", line 70, in corpus_from_files
    corpus = [f.result() for f in corpus]
  File "/home/jammurdo/anaconda3/envs/py27/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, in __get_result
    raise exception_type, self._exception, self._traceback
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15615: ordinal not in range(128)

vsm.extensions.corpusbuilders.corpusstreamers was not decoding utf-8 files properly.

I've resolved this, will be folded into 1.0b204 and 0.4.10 releases