Issue in running python extract_vocab.py
DevalNaik opened this issue · 3 comments
Error while loading word embedding glove
Logs:
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Traceback (most recent call last):
File "extract_vocab.py", line 23, in
use_small=USE_SMALL)
File "C:\Users\SQLNet\sqlnet\utils.py
", line 274, in load_word_emb
for idx, line in enumerate(inf):
File "C:\Users\miniconda3\lib\encodings\cp1252.py", line 23, in dec
ode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2438: cha
racter maps to
Error while loading word embedding glove
Logs:
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Traceback (most recent call last):
File "extract_vocab.py", line 23, in
use_small=USE_SMALL)
File "C:\Users\SQLNet\sqlnet\utils.py
", line 274, in load_word_emb
for idx, line in enumerate(inf):
File "C:\Users\miniconda3\lib\encodings\cp1252.py", line 23, in dec
ode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2438: cha
racter maps to
Execution is started with following changes in utils.py at row#273
with open(file_name,encoding="utf8") as inf:
Check your folder structure for data, Is your train_tok.jsonl under data folder or data/data/train_tok.jsonl?
thanks for editing @DevalNaik