cfiltnlp/pyiwn

'charmap' codec can't decode byte 0x8d in position 13: character maps to <undefined>

gokul427 opened this issue · 6 comments

iwn = pyiwn.IndoWordNet()

2022-11-06:13:21:14,789 INFO [iwn.py:43] Loading hindi language synsets...

UnicodeDecodeError Traceback (most recent call last)
Cell In [5], line 2
1 # language defaults to Hindi
----> 2 iwn = pyiwn.IndoWordNet()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()

File ~\anaconda3\envs\py38torch\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

In line 50 : instead of this f = open(filename) change it to this f = open(filename, encoding='utf-8')

Hi, I am getting the same issue, is there a way to resolve it?

go to the pyiwn file and change line 50 from this f = open(filename) change it to this f = open(filename, encoding='utf-8')

its not working who can i solve this

UnicodeDecodeError Traceback (most recent call last)
Cell In[4], line 3
1 import pyiwn
2 pyiwn.download()
----> 3 iwn = pyiwn.IndoWordNet()

File c:\g\pproject\aimbot\bot\bot\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()

File c:\g\pproject\aimbot\bot\bot\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()

File ~\anaconda3\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to
this is my error message .

can u pls a help me resolve this

i changed line 50 to the utf8 foormat also

srry it worked i forgot to restart my env