zhenglab/spiralnet

Training error for SliceGAN

Closed this issue · 1 comments

Getting this error when training the SliceGAN part:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [184]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The error is tracked to :

  File "train.py", line 2, in <module>
    main(mode=1)
  File "/trainman-mount/trainman-storage-542ea5a0-8586-4554-8dbf-60b996061b8a/img_outpainting/spiralnet/main.py", line 42, in main
    model.train()
  File "/trainman-mount/trainman-storage-542ea5a0-8586-4554-8dbf-60b996061b8a/img_outpainting/spiralnet/src/SliceGAN.py", line 117, in train
    fmask_data)
  File "/trainman-mount/trainman-storage-542ea5a0-8586-4554-8dbf-60b996061b8a/img_outpainting/spiralnet/src/model/model.py", line 268, in process
    g_fake_local = self.d(o * mask + data * (1 - mask))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/trainman-mount/trainman-storage-542ea5a0-8586-4554-8dbf-60b996061b8a/img_outpainting/spiralnet/src/model/networks.py", line 699, in forward
    features = self.features(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
    self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 2058, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled

The training process of SliceGAN is normal for us with the environment mentioned in the current repo, and we could not reproduce the problem you mentioned. Please try again according to our conda environment installation steps Installation.