memray/seq2seq-keyphrase-pytorch

Does not support multilayer decoder

Closed this issue · 1 comments

Hi, Rui

Inspired by BERT, I tried to set the encoder and decoder to be more than one layers to see whether the performance would improve. However, the model cannot run correctly when I set "-enc_layers 4 -dec_layers 4".

Belows are my settings:
python3 train.py -data data/kp20k -vocab data/kp20k.vocab.pt -exp_path "output/exp/attn_general.input_feeding.copy/%s.%s" -model_path "output/model/attn_general.input_feeding.copy/%s.%s" -pred_path "output/pred/attn_general.input_feeding.copy/%s.%s" -exp "kp20k" -batch_size 16 -copy_attention -run_valid_every 500 -save_model_every 1000000 -epochs 20 -beam_size 16 -beam_search_batch_size 1 -train_ml -attention_mode general -teacher_forcing_ratio 1 -learning_rate 0.00005 -max_sent_length 3 -rnn_size 256 -enc_layers 4 -dec_layers 4 -bidirectional -encoder_type brnn

And I got an error:
Traceback (most recent call last):
File "train.py", line 894, in main
train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, test_data_loader, opt)
File "train.py", line 476, in train_model
loss_ml, decoder_log_probs = train_ml(one2one_batch, model, optimizer_ml, criterion, opt)
File "train.py", line 114, in train_ml
decoder_log_probs, _, _ = model.forward(src, src_len, trg, src_oov, oov_lists)
File "/input/seq2seq-keyphrase-pytorch/pykp/model.py", line 415, in forward
trg_mask=trg_mask, ctx_mask=ctx_mask)
File "/input/seq2seq-keyphrase-pytorch/pykp/model.py", line 526, in decode
trg_emb, init_hidden
File "/usr/local/miniconda3/envs/dl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/miniconda3/envs/dl/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 178, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/usr/local/miniconda3/envs/dl/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 147, in check_forward_args
'Expected hidden[0] size {}, got {}')
File "/usr/local/miniconda3/envs/dl/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 143, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
RuntimeError: Expected hidden[0] size (4, 15, 256), got (1, 15, 256)

It seems that the encoder just outputs the context vector of only one layer even though I set "-enc_layers 4". I examined the code, and found that the problem probably came from the encode function of class Seq2SeqLSTMAttention.

    if self.bidirectional:
        h_t = torch.cat((src_h_t[-1], src_h_t[-2]), 1)
        c_t = torch.cat((src_c_t[-1], src_c_t[-2]), 1)
    else:
        h_t = src_h_t[-1]
        c_t = src_c_t[-1]

The above code shows that the encoder just keeps the h and c of the top layer, and passes them to the decoder. As far as I know, in multilayer seq2seq model, the encoder passes the context vectors of all layers to the decoder. But I am not so sure about this, because I'm new in NLP field.

Can you kindly solve this problem? Thank you very much!

In the beginning, I tried to make it compatible with both LSTM and GRU, but now it doesn't seem to be very necessary, and, as a result, the current implementation does not support multiple layers.