xiaojunxu/SQLNet

extract_vocab.py : all input arrays must have the same shape

Closed this issue · 1 comments

When executing extract_vocab.py it raised this error :

(base) C:\Users\Albel\Documents\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Length of word vocabulary: %d 1917495
Length of used word vocab: %s 39936
Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

It seems the error that you are getting is in the numpy stack function. The reason for this error is the word embedding generation script returns a map iterator object instead of a list of size equal to N_Word. In order to resolve this error, please change the below line in load_word_emb function present in sqlnet/utils.py script as below:

Python v2:
ret[info[0]] = np.array(map(lambda x:float(x), info[1:]))

Python v3:
ret[info[0]] = np.array(list(map(lambda x:float(x), info[1:])))

Let me know if it helped.