cfiltnlp/pyiwn

'charmap' codec can't encode character '\u2588' in position 2: character maps to <undefined>

CharviG opened this issue · 4 comments

While integrating pyiwn with my own product..getting error..

2020-08-27:12:31:07,306 INFO [pythServer.py:239] Evaluation of script Completed.
2020-8-27 12:31:07,3 ERROR [2020-08-27 12:31:07,306] INFO in pythServer: Evaluation of script Completed.
2020-8-27 13:03:13,5 ERROR 2020-08-27:13:03:13,578 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...
2020-8-27 13:03:28,6 ERROR --- Logging error ---
2020-8-27 13:03:28,6 ERROR 2020-08-27:13:03:28,661 ERROR [pythServer.py:235] 'charmap' codec can't encode character '\u2588' in position 2: character maps to
2020-8-27 13:03:28,6 ERROR Traceback (most recent call last):
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\PythService\pythServer.py", line 200, in evaluatePythonScript
2020-8-27 13:03:28,6 ERROR exec(str1,globals(),locals())
2020-8-27 13:03:28,6 ERROR File "", line 13, in
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn_init_.py", line 13, in
2020-8-27 13:03:28,6 ERROR if not download():
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn\helpers.py", line 39, in download
2020-8-27 13:03:28,6 ERROR sys.stdout.write('\r[{}{}]'.format('\u2588' * done, '.' * (50 - done)))
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\encodings\cp1252.py", line 19, in encode
2020-8-27 13:03:28,6 ERROR return codecs.charmap_encode(input,self.errors,encoding_table)[0]
2020-8-27 13:03:28,6 ERROR UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 2: character maps to

For the command giving error ...>>> iwn = pyiwn.IndoWordNet()
2020-08-27:16:35:07,116 INFO [iwn.py:43] Loading hindi language synsets...
Traceback (most recent call last):
File "<pyshell#11>", line 1, in
iwn = pyiwn.IndoWordNet()
File "C:\Python\lib\site-packages\pyiwn\iwn.py", line 45, in init
self._synset_df = self._load_synset_file(lang.value)
File "C:\Python\lib\site-packages\pyiwn\iwn.py", line 51, in _load_synset_file
synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
File "C:\Python\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

I am using python 3.7.9

As a temporary fix, you may go to iwn.py file-> _load_synset_file(self, lang) -> change f = open(filename) to f = open(filename,encoding='utf-8'). To modify the iwn.py file locally, I have used the PyCharm IDE.

@ArupDas15 I have changed it. Please uninstall the pip package, clone this repo and run python setup.py install