POSTECH-CVLab/PyTorch-StudioGAN

TypeError: 'tuple' object is not callable

goongzi-leean opened this issue · 5 comments

Thank you for this great work, but I seem to be having a trouble with my first run. I feel like this is a bug. My problems are as follows:

Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Traceback (most recent call last):
File "src/main.py", line 193, in
hdf5_path=hdf5_path)
File "drive/StudioGAN/src/loader.py", line 394, in load_worker
gen_acml_loss = worker.train_generator(current_step=step)
File "drive/StudioGAN/src/worker.py", line 627, in train_generator
gen_acml_loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 52, in backward
grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid)
File "drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 63, in forward
grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False, output_mask)
TypeError: 'tuple' object is not callable

Looking forward to your reply.

I'm sorry I forgot to post my code: !python3 src/main.py -cfg './src/configs/CIFAR10/StyleGAN2-ADA.yaml' -data '../../Dataset/' -save './outputs/cifar10_outputs/StyleGAN2-ADA/' --seed 82624 -t -hdf5 -l -metrics is fid prdc --pre_resizer lanczos --post_resizer friendly -sr -sf -sf_num 50000 -ifid --GAN_train --GAN_test

I have checked that this issue arrises when PyTorch version 1.12 (the one on the latest docker image) is used and fixed it two days ago! Make sure that you are using the latest version of StudioGAN.
If that still doesn't help, you might consider lowering torch version to 1.10.
Thanks!

Dear author,

I am sorry for replying to you so late.The reason is that I have encountered a new problem and I am working on solving it.

tcmalloc: large alloc 20000006144 bytes == 0x7f548fe82000 @ 0x7f5cda34b1e7 0x7f5c69bfb0ce 0x7f5c69c51cf5 0x7f5c69c51f4f 0x7f5c69cf4673 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bcb19 0x5134a6 0x549e0e 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x5118f8 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9
……
/usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 6 leaked semaphores to clean up at shutdown
len(cache))

This caused my code to stop at the first --save_freq, which annoyed me because I'm really a beginner.

In addation, please don't be offended. Since I found img_channels =3 in your code, I would like to make a small suggestion for you. Maybe you can add a simple dataset with channel=1, which will make it easier for others to use StudioGAN.

I tried this out and found that I only needed to modify the code where the cifar10 and channel=3 appear and the config file.

Thank you!

Best,

Leean

A new problem was discovered. Although I have solved it, I still want to tell you about this bug.

My code is: !python3 src/main.py -cfg './src/configs/CIFAR10/ACGAN-Mod.yaml' -data '../../Dataset/' -save './outputs/cifar10_outputs/ACGAN-Mod/' --seed 82624 --num_workers 2 -t -hdf5 -l -metrics none --pre_resizer lanczos --post_resizer friendly -sr -sf -sf_num 10000 --GAN_train --GAN_test --print_freq 5 --save_freq 10

File "src/main.py", line 193, in
hdf5_path=hdf5_path)
File "/drive/StudioGAN/src/loader.py", line 391, in load_worker
real_cond_loss, dis_acml_loss = worker.train_discriminator(current_step=step)
File "/drive/StudioGAN/src/worker.py", line 308, in train_discriminator
real_cond_loss = self.cond_loss(**real_dict)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'h'

The reason is that ResNet returns a dict with many arguments.

Hi
A new problem arises when I load a new Stylegan-Ada model instead of your trained model to continue training.

File "src/main.py", line 193, in
hdf5_path=hdf5_path)
File "/drive/StudioGAN/src/loader.py", line 394, in load_worker
gen_acml_loss = worker.train_generator(current_step=step)
File "/drive/StudioGAN/src/worker.py", line 627, in train_generator
gen_acml_loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/drive/StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 52, in backward
grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid)
File " /drive /StudioGAN/src/utils/style_ops/grid_sample_gradfix.py", line 63, in forward
grad_input, grad_grid = op[0](grad_output, input, grid, 0, 0, False, output_mask)

So I changed grid_sample_gradfix.py back and now it can continue training.