Help needed

Question

Help needed

arcontechnologies opened this issue 6 years ago · 9 comments

Is there any one who can share workable code. I went through the whole installation and I'm ending up with errors.
or at least a trained model.
thanks

Answer 1 · 2018-08-11T22:04:31.000Z

Please post the specific error that you are facing.

Answer 2 · 2018-08-13T15:50:15.000Z

Hi,

I'm having this error. I not able to go further as it's blocking and I didn't change anything in the code. The only difference is that I'm using python 3.6 instead of 2.7

When executing extract_vocab.py it raised this error :

(base) C:\Users\Albel\Documents\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from %s data/train_tok.jsonl
Loading data from %s data/train_tok.tables.jsonl
Loading data from %s data/dev_tok.jsonl
Loading data from %s data/dev_tok.tables.jsonl
Loading data from %s data/test_tok.jsonl
Loading data from %s data/test_tok.tables.jsonl
Loading word embedding from %s glove/glove.42B.300d.txt
Length of word vocabulary: %d 1917495
Length of used word vocab: %s 39936
Traceback (most recent call last):
File "extract_vocab.py", line 62, in
emb_array = np.stack(embs, axis=0)
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Answer 3 · 2018-08-13T16:12:24.000Z

It seems the error that you are getting is in the numpy stack function. The reason for this error is the word embedding generation script returns a map iterator object instead of a list of size equal to N_Word. In order to resolve this error, please change the below line in load_word_emb function present in sqlnet/utils.py script as below:

Python v2:
ret[info[0]] = np.array(map(lambda x:float(x), info[1:]))

Python v3:
ret[info[0]] = np.array(list(map(lambda x:float(x), info[1:])))

Let me know if it helped.

Answer 4 · 2018-08-13T17:30:59.000Z

Now I'm having another error :

C:\Users\albel\Downloads\SQLNet>python extract_vocab.py
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Length of word vocabulary: 1917480
Length of used word vocab: 39355
Traceback (most recent call last):
File "extract_vocab.py", line 65, in
np.save(open('glove/usedwordemb.npy', 'w'), emb_array)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\npyio.py", line 509, in save
pickle_kwargs=pickle_kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\format.py", line 555, in write_array
version)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\lib\format.py", line 328, in _write_array_header
fp.write(header_prefix)
TypeError: write() argument must be str, not bytes

Answer 5 · 2018-08-13T17:33:48.000Z

Please make the update as below:

np.save(open('glove/usedwordemb.npy', 'w'), emb_array) -> np.save(open('glove/usedwordemb.npy', 'wb'), emb_array)

Answer 6 · 2018-08-13T17:49:24.000Z

@koolcoder007 It works ! thanks for your help

Answer 7 · 2018-08-13T20:45:39.000Z

I'm coming back again because when I'm running python train.py --ca it's gave me the error below. Any insight ?
thanks

C:\Users\albel\Downloads\SQLNet>python train.py --ca
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Traceback (most recent call last):
File "train.py", line 57, in
gpu=GPU, trainable_emb = args.train_emb)
File "C:\Users\albel\Downloads\SQLNet\sqlnet\model\sqlnet.py", line 43, in init
self.agg_pred = AggPredictor(N_word, N_h, N_depth, use_ca=use_ca)
File "C:\Users\albel\Downloads\SQLNet\sqlnet\model\modules\aggregator_predict.py", line 18, in init
dropout=0.3, bidirectional=True)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\torch\nn\modules\rnn.py", line 425, in init
super(LSTM, self).init('LSTM', *args, **kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\torch\nn\modules\rnn.py", line 52, in init
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
TypeError: new() received an invalid combination of arguments - got (float, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m, �[31;1mint�[0m)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (�[31;1mfloat�[0m, �[31;1mint�[0m)

Answer 8 · 2018-08-14T13:55:19.000Z

fixed :
self.cond_num_lstm = nn.LSTM(input_size=int(N_word), hidden_size=int(N_h/2),
num_layers=int(N_depth), batch_first=True,
dropout=0.3, bidirectional=True)

Answer 9 · 2023-05-03T08:47:25.000Z

Please make the update as below:

np.save(open('glove/usedwordemb.npy', 'w'), emb_array) -> np.save(open('glove/usedwordemb.npy', 'wb'), emb_array)

Fix should also be in sqlnet/utils.py

with open('glove/usedwordemb.npy') as inf -> with open('glove/usedwordemb.npy', 'rb') as inf