xiaojunxu/SQLNet

Help needed

arcontechnologies opened this issue · 9 comments

Is there any one who can share workable code. I went through the whole installation and I'm ending up with errors.
or at least a trained model.
thanks

Please post the specific error that you are facing.

Hi,

I'm having this error. I not able to go further as it's blocking and I didn't change anything in the code. The only difference is that I'm using python 3.6 instead of 2.7

When executing extract_vocab.py it raised this error :

(base) C:\Users\Albel\Documents\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Length of word vocabulary: %d 1917495
Length of used word vocab: %s 39936
Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

It seems the error that you are getting is in the numpy stack function. The reason for this error is the word embedding generation script returns a map iterator object instead of a list of size equal to N_Word. In order to resolve this error, please change the below line in load_word_emb function present in sqlnet/utils.py script as below:

Python v2:
ret[info[0]] = np.array(map(lambda x:float(x), info[1:]))

Python v3:
ret[info[0]] = np.array(list(map(lambda x:float(x), info[1:])))

Let me know if it helped.

Now I'm having another error :

C:\Users\albel\Downloads\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Length of word vocabulary: 1917480
Length of used word vocab: 39355
Traceback (most recent call last):
File "extract_vocab.py", line 65, in
np.save(open('glove/usedwordemb.npy', 'w'), emb_array)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\npyio.py", line 509, in save
pickle_kwargs=pickle_kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\format.py", line 555, in write_array
version)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\format.py", line 328, in _write_array_header
fp.write(header_prefix)
TypeError: write() argument must be str, not bytes

Please make the update as below:

np.save(open('glove/usedwordemb.npy', 'w'), emb_array) -> np.save(open('glove/usedwordemb.npy', 'wb'), emb_array)

@koolcoder007 It works ! thanks for your help

I'm coming back again because when I'm running python train.py --ca it's gave me the error below. Any insight ?
thanks

C:\Users\albel\Downloads\SQLNet>python train.py --ca
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Traceback (most recent call last):
File "train.py", line 57, in
gpu=GPU, trainable_emb = args.train_emb)
File "C:\Users\albel\Downloads\SQLNet\sqlnet\model\sqlnet.py", line 43, in init
self.agg_pred = AggPredictor(N_word, N_h, N_depth, use_ca=use_ca)
File "C:\Users\albel\Downloads\SQLNet\sqlnet\model\modules\aggregator_predict.py", line 18, in init
dropout=0.3, bidirectional=True)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\torch\nn\modules\rnn.py", line 425, in init
super(LSTM, self).init('LSTM', *args, **kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\torch\nn\modules\rnn.py", line 52, in init
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
TypeError: new() received an invalid combination of arguments - got (float, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
    didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m, �[31;1mint�[0m)
  • (object data, torch.device device)
    didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m, �[31;1mint�[0m)

fixed :
self.cond_num_lstm = nn.LSTM(input_size=int(N_word), hidden_size=int(N_h/2),
num_layers=int(N_depth), batch_first=True,
dropout=0.3, bidirectional=True)

Please make the update as below:

np.save(open('glove/usedwordemb.npy', 'w'), emb_array) -> np.save(open('glove/usedwordemb.npy', 'wb'), emb_array)

Fix should also be in sqlnet/utils.py

with open('glove/usedwordemb.npy') as inf -> with open('glove/usedwordemb.npy', 'rb') as inf