extract_vocab.py : all input arrays must have the same shape

Question

extract_vocab.py : all input arrays must have the same shape

Closed this issue 6 years ago · 1 comments

When executing extract_vocab.py it raised this error :

(base) C:\Users\Albel\Documents\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Length of word vocabulary: %d 1917495
Length of used word vocab: %s 39936
Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Answer 1 · 2018-08-13T16:16:28.000Z

It seems the error that you are getting is in the numpy stack function. The reason for this error is the word embedding generation script returns a map iterator object instead of a list of size equal to N_Word. In order to resolve this error, please change the below line in load_word_emb function present in sqlnet/utils.py script as below:

Python v2:
ret[info[0]] = np.array(map(lambda x:float(x), info[1:]))

Python v3:
ret[info[0]] = np.array(list(map(lambda x:float(x), info[1:])))

Let me know if it helped.