Requirements and device versions

Question

Requirements and device versions

pow3rpi opened this issue a year ago · 2 comments

Hi there,

First of all, thank you for your work. I believe your model can produce wondering results but when I try to run it I face the same problem all the time.

Before I start describing the issue I'd like to recommend you to specify the exact version of Pillow in your "requirements.txt" as it doesn't work properly when you use latest Pillow version, producing the following error:

ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/PIL/init.py)

Actually, your Pillow version should be <6.0.0 in order not to edit "PIL/init.py" file to use "PILLOW_VERSION" in the correct way.

Coming back to my problem, when I run "bash test_scripts/ast_summer2winteryosemite.sh" (or any other test command) whatever version of CUDA (10...12), cudatoolkit and Ubuntu (18, 20, 22) I use the model produces the following error:

Traceback (most recent call last):
File "test.py", line 34, in
generated = model(data_i, mode='inference')
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ML-OpenPet/tsit/models/pix2pix_model.py", line 51, in forward
fake_image, _ = self.generate_fake(input_semantics, real_image)
File "/home/ubuntu/ML-OpenPet/tsit/models/pix2pix_model.py", line 197, in generate_fake
fake_image = self.netG(input_semantics, real_image, z=z)
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ML-OpenPet/tsit/models/networks/generator.py", line 85, in forward
ft0, ft1, ft2, ft3, ft4, ft5, ft6, ft7 = self.content_stream(content)
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ML-OpenPet/tsit/models/networks/stream.py", line 29, in forward
x0 = self.res_0(input) # (n,64,256,512)
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ML-OpenPet/tsit/models/networks/architecture.py", line 108, in forward
x_s = self.shortcut(x)
File "/home/ubuntu/ML-OpenPet/tsit/models/networks/architecture.py", line 119, in shortcut
x_s = self.actvn(self.norm_layer_s(self.conv_s(x)))
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
hook(self, input)
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/utils/spectral_norm.py", line 99, in call
setattr(module, self.name, self.compute_weight(module, do_power_iteration=module.training))
File "/home/ubuntu/anaconda3/envs/tsit/lib/python3.7/site-packages/torch/nn/utils/spectral_norm.py", line 85, in compute_weight
sigma = torch.dot(u, torch.mv(weight_mat, v))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:117

I hope you can help me with dealing with that issue. All the tests are performed on AWS. It would be extremely helpful if you shared with me your full list of working requirements (which can be obtained by "pip freeze" command) and versions of CUDA, nvidia driver and Ubuntu.

Thank you in advance!

Answer 1 · 2024-06-05T02:16:16.000Z

upgrade your torch and torchvision accordingly to your GPU,pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html may help,and pillow should be 6.2.2

Answer 2 · 2024-06-18T19:05:38.000Z

@clmdy Thanks for Pillow version specified! CUDA issue was also resolved.