Exact pytorch version requirement

Question

Exact pytorch version requirement

AshishSardana opened this issue 4 years ago · 3 comments

I'm trying to run the train.py with pytorch 1.4 (docker container - nvcr.io/nvidia/pytorch:20.01-py3) which results in pytorch related error:

root@dc5fb4969999:/SceneGraphModification/code# python train.py --data-dir $DATA --epochs $EPOCH --seed 1 --ckpt-dir $CKPT_DIR --modification $FUSION --batch-size 256 --accumulation-steps 1 > $log
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "train.py", line 166, in <module>
    main()
  File "train.py", line 133, in main
    loss = model(samples["src_graph"], samples["src_text"], samples["tgt_graph"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 167, in forward
    _, node_outputs, _, edge_outputs = self.graph_dec(enc_info, tgt_graph["nodes"], tgt_graph["edges"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 487, in forward
    node_rnn_outputs, _, node_outputs = self.node_forward(enc_info, nodes["x"], nodes_lens)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 440, in node_forward
    context, _ = self.node_att(rnn_outputs, enc_info["mem"], enc_info["mem_masks"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 366, in forward
    align.masked_fill_(1 - mask, -float('inf'))
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 394, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.

I've also tried running it with pytorch 1.8 which leads to another pytorch error:

root@1d6ce713c477:/SceneGraphModification/code# python train.py --data-dir $DATA --epochs $EPOCH --seed 1 --ckpt-dir $CKPT_DIR --modification $FUSION --batch-size 256 --accumulation-steps 1 > $log
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/rnn.py:58: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
  warnings.warn("dropout option adds dropout after all but last "
/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/data_utils.py:37: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  ../torch/csrc/utils/python_arg_parser.cpp:962.)
  flat_edges = [edge.view(-1)[torch.tril(edge, -1).view(-1).nonzero()].view(-1) for edge in edges]
Traceback (most recent call last):
  File "train.py", line 166, in <module>
    main()
  File "train.py", line 133, in main
    loss = model(samples["src_graph"], samples["src_text"], samples["tgt_graph"])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 167, in forward
    _, node_outputs, _, edge_outputs = self.graph_dec(enc_info, tgt_graph["nodes"], tgt_graph["edges"])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 487, in forward
    node_rnn_outputs, _, node_outputs = self.node_forward(enc_info, nodes["x"], nodes_lens)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 435, in node_forward
    padded_nodes_embeds = nn.utils.rnn.pack_padded_sequence(nodes_embeds, nodes_len, batch_first=True)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 245, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Can you share the exact pytorch version (and if it helps, the cuda and cudnn versions too) that you've developed this codebase with?
Appreciate it!

Answer 1 · 2021-02-27T02:46:45.000Z

Sorry I accidentally put pytorch1.4 on the readme file. We used pytorch1.1 for all experiments.

Answer 2 · 2021-02-27T02:52:58.000Z

Hi @xlhex ,

I'm trying to run the train.py with pytorch 1.4 (docker container - nvcr.io/nvidia/pytorch:20.01-py3) which results in pytorch related error:

root@dc5fb4969999:/SceneGraphModification/code# python train.py --data-dir $DATA --epochs $EPOCH --seed 1 --ckpt-dir $CKPT_DIR --modification $FUSION --batch-size 256 --accumulation-steps 1 > $log
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "train.py", line 166, in <module>
    main()
  File "train.py", line 133, in main
    loss = model(samples["src_graph"], samples["src_text"], samples["tgt_graph"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 167, in forward
    _, node_outputs, _, edge_outputs = self.graph_dec(enc_info, tgt_graph["nodes"], tgt_graph["edges"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 487, in forward
    node_rnn_outputs, _, node_outputs = self.node_forward(enc_info, nodes["x"], nodes_lens)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 440, in node_forward
    context, _ = self.node_att(rnn_outputs, enc_info["mem"], enc_info["mem_masks"])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 366, in forward
    align.masked_fill_(1 - mask, -float('inf'))
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 394, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.

I've also tried running it with pytorch 1.8 which leads to another pytorch error:

root@1d6ce713c477:/SceneGraphModification/code# python train.py --data-dir $DATA --epochs $EPOCH --seed 1 --ckpt-dir $CKPT_DIR --modification $FUSION --batch-size 256 --accumulation-steps 1 > $log
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/rnn.py:58: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
  warnings.warn("dropout option adds dropout after all but last "
/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/data_utils.py:37: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  ../torch/csrc/utils/python_arg_parser.cpp:962.)
  flat_edges = [edge.view(-1)[torch.tril(edge, -1).view(-1).nonzero()].view(-1) for edge in edges]
Traceback (most recent call last):
  File "train.py", line 166, in <module>
    main()
  File "train.py", line 133, in main
    loss = model(samples["src_graph"], samples["src_text"], samples["tgt_graph"])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 167, in forward
    _, node_outputs, _, edge_outputs = self.graph_dec(enc_info, tgt_graph["nodes"], tgt_graph["edges"])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 487, in forward
    node_rnn_outputs, _, node_outputs = self.node_forward(enc_info, nodes["x"], nodes_lens)
  File "/media/d2b/ashish/tme/gaugan/SceneGraphModification/code/models.py", line 435, in node_forward
    padded_nodes_embeds = nn.utils.rnn.pack_padded_sequence(nodes_embeds, nodes_len, batch_first=True)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 245, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Can you share the exact pytorch version (and if it helps, the cuda and cudnn versions too) that you've developed this codebase with?
Appreciate it!

Regarding cuda, we used cuda/10.0. Hope this help.

Answer 3 · 2021-02-27T09:07:56.000Z

Thank you!