RuntimeError: CUDA error: no kernel image is available for execution on the device

Question

RuntimeError: CUDA error: no kernel image is available for execution on the device

Closed this issue 4 months ago · 2 comments

I am running gscream on a V100, and my CUDA version is 11.8. Has anyone encountered similar circumstances as described below?

2024-10-17 21:34:38,933 - INFO: save code failed~ 2024-10-17 21:34:38,933 - INFO: Optimizing outputs/spinnerf_dataset/1/gscream/ Training progress: 0%| | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 1076, in <module> training(lp.extract(args), op.extract(args), pp.extract(args), dataset, args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, wandb, logger) File "train.py", line 433, in training voxel_visible_mask, position2D_x, position2D_y = prefilter_position2D(viewpoint_cam, gaussians, pipe, background) File "/home/ryan/GScream/gaussian_renderer/__init__.py", line 302, in prefilter_position2D return radii_pure>0, position2D_pure_x, position2D_pure_y RuntimeError: CUDA error: no kernel image is available for execution on the device Training progress: 0%| | 0/30000 [00:00<?, ?it/s] (gscream) root@bell-90:/home/ryan/GScream#

Answer 1 · 2024-10-18T07:52:30.000Z

Hi minyan, I can provide some debugging ideas:

Since I am using RTX 3090, in the setup.py, I hardcoded the gencode=arch with compute_86 and sm_86 when compiling 'diff-gaussian-rasterization'. For Tesla V100, you may try changing it to compute_70 and sm_70 or other versions, and re-compiling it.
Try lowering the CUDA version to CUDA 11.6 as us, and install the corresponding version of PyTorch, we used pytorch=1.12.1=py3.7_cuda11.6_cudnn8.3.2_0.
Check if the GPU driver version is sufficient to support CUDA 11.8.

Sorry that I don't have a Tesla V100 machine to reproduce the bug. I hope you can provide feedback after trying the above methods. We could discuss it later. Thank you!

Answer 2 · 2024-10-19T04:30:07.000Z

I changed compute_86 and sm_86 to compute_70 and sm_70 and it worked just fine. Thanks a lot!