sniklaus/softmax-splatting

How to run the code with second GPU device('cuda:1')

Closed this issue · 24 comments

The forward warping functions in softsplat.py produce warped output only when the device id is 'cuda:0'. With other GPUs, the forward warped output is the same as the initialized zero tensor. Is there an approach to perform the forward warp in GPU devices other than the one with device id 'cuda:0'?

Should work just fine, @JasonSheng-atp seems to be using it successfully on multiple GPUs: #46 (comment)

Thanks for your reply. @JasonSheng-atp uses multiple GPUs for code execution but I want to run solely on a secondary GPU('cuda:1'). Being new to pytorch and cupy, I just want to know if some changes are to be made while moving variables to GPU device(.cuda())

Similar error happens to me a long time ago. I think it may be helpful if you could print related tensors' devices or provide a simple reproduction python script. like print(a.device)

Thanks for the reply @JasonSheng-atp. I printed the devices for each tensor as you suggested. I got the following output and error:

Tensor 1 device: cuda:3
Tensor 2 device: cuda:3
Flow device: cuda:3
Metric device: cuda:3
Traceback (most recent call last):
File "run.py", line 64, in
tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average')
File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 362, in FunctionSoftsplat
tenOutput = _FunctionSoftsplat.apply(tenInput, tenFlow)
File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 267, in forward
cupy_launch('kernel_Softsplat_updateOutput', cupy_kernel('kernel_Softsplat_updateOutput', {
File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 246, in cupy_launch
return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 373, in compile_with_cache
return _compile_with_cache_cuda(
File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 484, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.dealloc'
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

If you know how to resolve this please let me know...

I am not sure since I dont know cupy, but I guess there may be some tensors generated on cuda:0 somewhere in the softmax splatting process. That's why there is always an illegal memory access error. I suggest that run one net solely on a GPU, and use DDP if you want to implement it on multiple GPUs. Please also check if the flow is infinity.

I am running the code solely on cuda:3 only. I guess the softmax splatting function creates some tensor in cuda:0. I will take a look at that. Also, what do you mean by DDP?

from torch.nn.parallel import DistributedDataParallel as DDP use distributed ways to isolate GPUs. For more information, you can check the official docs.

Thanks @JasonSheng-atp for chiming in!

I added:

tenOne = tenOne.to(torch.device('cuda:3'))
tenTwo = tenTwo.to(torch.device('cuda:3'))
tenFlow = tenFlow.to(torch.device('cuda:3'))
tenMetric = tenMetric.to(torch.device('cuda:3'))

To this line:

That worked just fine for me. If it doesn't for you @ShrisudhanG then please provide the output of cupy.show_config().

I have already shifted the tensors to the GPU device that is being used currently. I have also printed the device to which each tensor is assigned in the previous message. When I tried to print the output of cupy.show_config(), I got the following error:

Traceback (most recent call last):
File "run.py", line 60, in
print(cupy.show_config())
File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/init.py", line 866, in show_config
_sys.stdout.write(str(_cupyx.get_runtime_info()))
File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupyx/_runtime.py", line 225, in str
props['name'].decode('utf-8')),
AttributeError: 'str' object has no attribute 'decode'

I have already shifted the tensors to the GPU device that is being used currently.

What happens if you run the provided run.py with the changes outlined in my previous reply?

I tried to print the output of cupy.show_config()

cupy.show_config() does the printing for you, and it probably returns None hence the error.

I have made the changes you suggested. This is the code that I am running now:

#!/usr/bin/env python

import torch

import cv2
import numpy
import cupy

import softsplat

##########################################################

assert(int(str('').join(torch.__version__.split('.')[0:2])) >= 13) # requires at least pytorch version 1.3.0

##########################################################

def read_flo(strFile):
    with open(strFile, 'rb') as objFile:
        strFlow = objFile.read()
    # end

    assert(numpy.frombuffer(buffer=strFlow, dtype=numpy.float32, count=1, offset=0) == 202021.25)

    intWidth = numpy.frombuffer(buffer=strFlow, dtype=numpy.int32, count=1, offset=4)[0]
    intHeight = numpy.frombuffer(buffer=strFlow, dtype=numpy.int32, count=1, offset=8)[0]

    return numpy.frombuffer(buffer=strFlow, dtype=numpy.float32, count=intHeight * intWidth * 2, offset=12).reshape(intHeight, intWidth, 2)
# end

##########################################################

backwarp_tenGrid = {}

def backwarp(tenInput, tenFlow):
	if str(tenFlow.shape) not in backwarp_tenGrid:
		tenHor = torch.linspace(-1.0 + (1.0 / tenFlow.shape[3]), 1.0 - (1.0 / tenFlow.shape[3]), tenFlow.shape[3]).view(1, 1, 1, -1).expand(-1, -1, tenFlow.shape[2], -1)
		tenVer = torch.linspace(-1.0 + (1.0 / tenFlow.shape[2]), 1.0 - (1.0 / tenFlow.shape[2]), tenFlow.shape[2]).view(1, 1, -1, 1).expand(-1, -1, -1, tenFlow.shape[3])

		backwarp_tenGrid[str(tenFlow.shape)] = torch.cat([ tenHor, tenVer ], 1).to(device)#.cuda()
	# end

	tenFlow = torch.cat([ tenFlow[:, 0:1, :, :] / ((tenInput.shape[3] - 1.0) / 2.0), tenFlow[:, 1:2, :, :] / ((tenInput.shape[2] - 1.0) / 2.0) ], 1)

	return torch.nn.functional.grid_sample(input=tenInput, grid=(backwarp_tenGrid[str(tenFlow.shape)] + tenFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros', align_corners=False)
# end

##########################################################

device = torch.device('cuda:3')
tenOne = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/one.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).to(device)
tenTwo = torch.FloatTensor(numpy.ascontiguousarray(cv2.imread(filename='./images/two.png', flags=-1).transpose(2, 0, 1)[None, :, :, :].astype(numpy.float32) * (1.0 / 255.0))).to(device)
tenFlow = torch.FloatTensor(numpy.ascontiguousarray(read_flo('./images/flow.flo').transpose(2, 0, 1)[None, :, :, :])).to(device)

tenMetric = torch.nn.functional.l1_loss(input=tenOne, target=backwarp(tenInput=tenTwo, tenFlow=tenFlow), reduction='none').mean(1, True)

tenOne = tenOne.to(torch.device('cuda:3'))
tenTwo = tenTwo.to(torch.device('cuda:3'))
tenFlow = tenFlow.to(torch.device('cuda:3'))
tenMetric = tenMetric.to(torch.device('cuda:3'))

print('Tensor 1 device:', tenOne.device)
print('Tensor 2 device:', tenTwo.device)
print('Flow device:', tenFlow.device)
print('Metric device:', tenMetric.device)
cupy.show_config()

intTime = 1
fltTime = 1.0
tenSummation = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='summation')
tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average')
tenLinear = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=(0.3 - tenMetric).clip(0.0000001, 1.0), strType='linear') # finding a good linearly metric is difficult, and it is not invariant to translations
tenSoftmax = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=-20.0 * tenMetric, strType='softmax') # -20.0 is a hyperparameter, called 'alpha' in the paper, that could be learned using a torch.Parameter

print('Forward warp summation:', tenSummation.device)
print('Forward warp average:', tenAverage.device)
print('Forward warp linear:', tenLinear.device)
print('Forward warp softmax:', `tenSoftmax.device)`

And this is the error I get on running this:

Tensor 1 device: cuda:3
Tensor 2 device: cuda:3
Flow device: cuda:3
Metric device: cuda:3
Traceback (most recent call last):
  File "run.py", line 65, in <module>
    cupy.show_config()
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/__init__.py", line 866, in show_config
    _sys.stdout.write(str(_cupyx.get_runtime_info()))
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupyx/_runtime.py", line 225, in __str__
    props['name'].decode('utf-8')),
AttributeError: 'str' object has no attribute 'decode'

Please let me know if I am doing something wrong...

What does nvidia-smi return?

Output of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   32C    P8     9W / 250W |     10MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   36C    P8    10W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |                  N/A |
| 23%   33C    P8     9W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN X (Pascal)    Off  | 00000000:84:00.0 Off |                  N/A |
| 23%   30C    P8     9W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

You are using different GPUs, the CuPy kernels probably get compiled for one of them and fail when used for the other(s). Try only using devices 1 through 3, for example by using CUDA_VISIBLE_DEVICES="1,2,3" python yourscript.py.

Sorry for the delayed response. I tried what you suggested and still got the illegal memory access error.

(lf) prasan@jarvis:/media/data/prasan/shrisudhan/softmax-splatting$ CUDA_VISIBLE_DEVICES="1, 2, 3" python run.py
Tensor 1 device: cuda:1
Tensor 2 device: cuda:1
Flow device: cuda:1
Metric device: cuda:1
Traceback (most recent call last):
  File "run.py", line 70, in <module>
    tenAverage = softsplat.FunctionSoftsplat(tenInput=tenOne, tenFlow=tenFlow * fltTime, tenMetric=None, strType='average')
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 362, in FunctionSoftsplat
    tenOutput = _FunctionSoftsplat.apply(tenInput, tenFlow)
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 267, in forward
    cupy_launch('kernel_Softsplat_updateOutput', cupy_kernel('kernel_Softsplat_updateOutput', {
  File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
  File "/media/data/prasan/shrisudhan/softmax-splatting/softsplat.py", line 246, in cupy_launch
    return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 373, in compile_with_cache
    return _compile_with_cache_cuda(
  File "/media/data/prasan/anaconda3/envs/lf/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 431, in _compile_with_cache_cuda
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Another interesting thing I observed is that, when is use the command you suggested to run the script with cuda:3, I got the following error:

Traceback (most recent call last):
  File "run.py", line 56, in <module>
    tenOne = tenOne.to(device)
RuntimeError: CUDA error: invalid device ordinal

I don't know why pytorch fails to recognize cuda:3 GPU device with here. Without the CUDA_VISIBLE_DEVICES command the device is being recognized.

Can you delete the ~/.cupy/ folder and try again?

I don't know why pytorch fails to recognize cuda:3 GPU device with here. Without the CUDA_VISIBLE_DEVICES command the device is being recognized.

That is expected. If you set CUDA_VISIBLE_DEVICES="1,2,3" then you only have access to three CUDA devices which start being indexed at 0 again: cuda:0, cuda:1 and cuda:2.

I deleted the ~/.cupy/ folder and ran the code again with CUDA_VISIBLE_DEVICES="1,2,3" and with device='cuda:2'. Still encountering the same Illegal memory access error.

I am afraid that I have no idea then. My best guess is that something with the mixed-GPU setup is causing issues since it works just fine in the multi-GPU environments that I have encountered so far (but they all had homogeneous GPU configurations).

Okay. Thanks for helping out anyways.

Please share your findings if you end up making it work in your environment, thanks!

I just updated the repo, maybe you will have more luck with the new version.

Hi, sorry for the late response. I will try this and let you know if this works. Thanks!

Any updates by chance? Thanks!

Closing due to inactivity, seems like this is no longer an issue? Feel free to reopen if it is though.