ratishsp/data2text-plan-py

Run into TypeError while trying to train the model

AidenKitamura opened this issue · 3 comments

Hello! Currently I was following all the steps in the readme file. While running the model training part I ran into some TypeError. Here are the details:

python version: 2.7.16
torch version: 10.1
torchtext version: 0.2.3
CUDA version: 10.1

When running this:
python train.py -data $BASE/preprocess/roto -save_model $BASE/gen_model/$IDENTIFIER/roto -encoder_type1 mean -decoder_type1 pointer -enc_layers1 1 -dec_layers1 1 -encoder_type2 brnn -decoder_type2 rnn -enc_layers2 2 -dec_layers2 2 -batch_size 5 -feat_merge mlp -feat_vec_size 600 -word_vec_size 600 -rnn_size 600 -seed 1234 -start_checkpoint_at 4 -epochs 25 -optim adagrad -learning_rate 0.15 -adagrad_accumulator_init 0.1 -report_every 100 -copy_attn -truncated_decoder 100 -gpuid $GPUID -attn_hidden 64 -reuse_copy_attn -start_decay_at 4 -learning_rate_decay 0.97 -valid_batch_size 5

I got the following message:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.3 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Experiment 22-4.4 using attn_dim of 64
Loading train dataset from ../boxscore-data//preprocess/roto.train.1.pt, number of examples: 3371

  • vocabulary size. source1 = 1164; target1 = 391, source2 = 956; target2 = 9902
  • src feature 0 size = 702
  • src feature 1 size = 39
  • src feature 2 size = 4
    Building model...
    Intializing model parameters.
    Intializing model parameters.
    NMTModel(
    (encoder): MeanEncoder(
    (embeddings): Embeddings(
    (make_embedding): Sequential(
    (emb_luts): Elementwise(
    (0): Embedding(1164, 600, padding_idx=1)
    (1): Embedding(702, 600, padding_idx=1)
    (2): Embedding(39, 600, padding_idx=1)
    (3): Embedding(4, 600, padding_idx=1)
    )
    (mlp): Sequential(
    (0): Linear(in_features=2400, out_features=600, bias=True)
    (1): ReLU()
    )
    )
    )
    (dropout): Dropout(p=0.3)
    (attn): GlobalSelfAttention(
    (transform_in): Sequential(
    (0): Linear(in_features=600, out_features=64, bias=True)
    (1): ELU(alpha=0.1)
    )
    (linear_in): Linear(in_features=64, out_features=64, bias=False)
    (linear_out): Linear(in_features=1200, out_features=600, bias=False)
    (sm): Softmax()
    (tanh): Tanh()
    (dropout): Dropout(p=0.3)
    )
    )
    (decoder): PointerRNNDecoder(
    (embeddings): Embeddings(
    (make_embedding): Sequential(
    (emb_luts): Elementwise(
    (0): Embedding(391, 600, padding_idx=1)
    )
    )
    )
    (dropout): Dropout(p=0.3)
    (rnn): LSTM(600, 600, dropout=0.3)
    (attn): PointerAttention(
    (linear_in): Linear(in_features=600, out_features=600, bias=False)
    (sm): LogSoftmax()
    )
    )
    (generator): Sequential(
    (0): Linear(in_features=600, out_features=391, bias=True)
    (1): LogSoftmax()
    )
    )
    NMTModel(
    (encoder): RNNEncoder(
    (embeddings): Embeddings(
    (make_embedding): Sequential(
    (emb_luts): Elementwise(
    (0): Embedding(956, 600, padding_idx=1)
    (1): Embedding(547, 600, padding_idx=1)
    (2): Embedding(33, 600, padding_idx=1)
    (3): Embedding(4, 600, padding_idx=1)
    )
    (mlp): Sequential(
    (0): Linear(in_features=2400, out_features=600, bias=True)
    (1): ReLU()
    )
    )
    )
    (rnn): LSTM(600, 300, num_layers=2, dropout=0.3, bidirectional=True)
    )
    (decoder): InputFeedRNNDecoder(
    (embeddings): Embeddings(
    (make_embedding): Sequential(
    (emb_luts): Elementwise(
    (0): Embedding(9902, 600, padding_idx=1)
    )
    )
    )
    (dropout): Dropout(p=0.3)
    (rnn): StackedLSTM(
    (dropout): Dropout(p=0.3)
    (layers): ModuleList(
    (0): LSTMCell(1200, 600)
    (1): LSTMCell(600, 600)
    )
    )
    (attn): GlobalAttention(
    (linear_in): Linear(in_features=600, out_features=600, bias=False)
    (linear_out): Linear(in_features=1200, out_features=600, bias=False)
    (sm): Softmax()
    (tanh): Tanh()
    )
    )
    (generator): CopyGenerator(
    (linear): Linear(in_features=600, out_features=9902, bias=True)
    (linear_copy): Linear(in_features=600, out_features=1, bias=True)
    )
    )
  • number of parameters: 7062951
    ('encoder: ', 3348560)
    ('decoder: ', 3714391)
  • number of parameters: 26876703
    ('encoder: ', 6694200)
    ('decoder: ', 20182503)
    Making optimizer for training.
    Making optimizer for training.

Start training...

  • number of epochs: 25, starting from Epoch 1
  • batch size: 5

Loading train dataset from ../boxscore-data//preprocess/roto.train.1.pt, number of examples: 3371
/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "train.py", line 454, in
main()
File "train.py", line 446, in main
train_model(model1, model2, fields, optim1, optim2, data_type, model_opt)
File "train.py", line 256, in train_model
train_stats, train_stats2 = trainer.train(train_iter, epoch, report_func)
File "/home/aiden/files/harvardnlp/data2text-plan-py/onmt/Trainer.py", line 181, in train
report_stats, total_stats2, report_stats2, normalization)
File "/home/aiden/files/harvardnlp/data2text-plan-py/onmt/Trainer.py", line 345, in _gradient_accumulation
self.model(src, tgt, src_lengths, dec_state)
File "/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/aiden/files/harvardnlp/data2text-plan-py/onmt/Models.py", line 684, in forward
memory_lengths=lengths)
File "/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/aiden/files/harvardnlp/data2text-plan-py/onmt/Models.py", line 339, in forward
attns[k] = torch.stack(attns[k])
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor

When I check Models.py, I got this from line 339 onwards:
for k in attns:
attns[k] = torch.stack(attns[k])

And this from _run_forward_pass():
# Calculate the attention.
p_attn = self.attn(rnn_output.transpose(0, 1).contiguous(), memory_bank.transpose(0, 1), memory_lengths=memory_lengths)
attns["std"] = p_attn

# decoder_outputs = self.dropout(decoder_outputs)
return decoder_final, None, attns

Is there anything wrong with _run_forward_pass()? Thanks.

I think the issue is with the version of Pytorch. The Pytorch version required is 0.3.1 https://github.com/ratishsp/data2text-plan-py/blob/master/requirements.txt

Hi! I just degraded pytorch and its dependencies then I'm getting these errors:

THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=88 error=30 : unknown error
Traceback (most recent call last):
File "train.py", line 54, in
cuda.set_device(opt.gpuid[0])
File "/home/aiden/anaconda/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 244, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (30) : unknown error at torch/csrc/cuda/Module.cpp:88

while nvidia-smi gives this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74 Driver Version: 418.74 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:01:00.0 On | N/A |
| 23% 29C P8 17W / 250W | 160MiB / 12188MiB | 9% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 602 G /usr/lib/Xorg 154MiB |
| 0 1285 G compton 3MiB |
+-----------------------------------------------------------------------------+
What could be wrong? Thanks.

I assume that you have set the gpuid correctly.
Otherwise, it seems to be CUDA specific issue. In my setup I have CUDA 8.0. I haven't tried with CUDA 10.1