SwinTransformer/MIM-Depth-Estimation

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Opened this issue · 7 comments

Hi there,

Thanks for your excellent work. I have this problem when I train and test your code. Do you have any idea what is wrong? Since I find that the data and model are all in cuda.

Thanks in advance!

I solved that through transferring it into toch.nn.parallel.DistributedDataParallel.
However, I met another CUDA Memory Error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 11.91 GiB total capacity; 10.99 GiB already allocated; 3.88 MiB free; 11.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2

I think you have to reduce the batch size, even though I have 2 X2080ti , batch-size set as 2

My GPU is a Titan Xp with 12GB of memory, and the image size is 576*576, but I still get a "out of memory" error even when I set the batch size to 1.

I am facing it, can you share solution? @afpapqy @landiaokafeiyan

I modified little bit and I can run without device error

in models/swin_transformer_v2.py line 294
original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp()
modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()

this is an example. You can get another variable to change the tensor's device status.

Hi @afpapqy @PigBroA
when I test the image with 3000x4000, I have to piece the image into several patches which will decrease the performance. Do you have any good ideas to slove this problem?

Thanks in advance.

kmbmjn commented

I modified little bit and I can run without device error

in models/swin_transformer_v2.py line 294 original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp() modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to('cuda:0'))).exp()

this is an example. You can get another variable to change the tensor's device status.

Thank you for this solution!
For the multi-GPU environment, I encountered another error with "cuda:0" and "cuda:1" and alternatively, I used the following modification:

original: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp()
modified: logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).to(self.logit_scale.device))).exp()