eliphatfs/zerorf

CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

Closed this issue ยท 3 comments

Hey, I am using wsl with ubuntu 20.04, cuda 11.8. I got the following when I tried to run: python zerorf.py --load-image=examples/ice.png

Is there any way I can try?

wandb: Currently logged in as: flandre. Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /mnt/c/Users/msz/Documents/Github_projs/zerorf/results/test/wandb/run-20231228_145824-hufokbtz
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run test
wandb: โญ๏ธ View project at https://wandb.ai/flandre/zerorf
wandb: ๐Ÿš€ View run at https://wandb.ai/flandre/zerorf/runs/hufokbtz
0%| | 0/10000 [00:00<?, ?it/s]2023-12-28 14:58:30,782 - mmgen - INFO - Initialize codes from scratch.
Shape of c2w: torch.Size([1, 6, 4, 4])
Shape of directions: torch.Size([1, 6, 320, 320, 3])
0%| | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "zerorf.py", line 227, in
lv = nerf.train_step(data_entry, optim)['log_vars']
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py", line 207, in train_step
cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 65, in get_cam_rays
rays_o, rays_d = get_rays(directions, c2w, norm=True)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 56, in get_rays
rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (, h, w, 3)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /mnt/c/Users/msz/Documents/Github_projs/zerorf/zerorf.py:227 in โ”‚
โ”‚ โ”‚
โ”‚ 224 best_psnr = 0.0 โ”‚
โ”‚ 225 โ”‚
โ”‚ 226 for j in prog: โ”‚
โ”‚ โฑ 227 โ”‚ lv = nerf.train_step(data_entry, optim)['log_vars'] โ”‚
โ”‚ 228 โ”‚ lr_sched.step() โ”‚
โ”‚ 229 โ”‚ lv.pop('code_rms') โ”‚
โ”‚ 230 โ”‚ lv.pop('loss') โ”‚
โ”‚ โ”‚
โ”‚ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py:207 in โ”‚
โ”‚ train_step โ”‚
โ”‚ โ”‚
โ”‚ 204 โ”‚ โ”‚ โ”‚
โ”‚ 205 โ”‚ โ”‚ num_scenes, num_imgs, h, w, _ = cond_imgs.size() โ”‚
โ”‚ 206 โ”‚ โ”‚ # (num_scenes, num_imgs, h, w, 3) โ”‚
โ”‚ โฑ 207 โ”‚ โ”‚ cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w) โ”‚
โ”‚ 208 โ”‚ โ”‚ dt_gamma_scale = self.train_cfg.get('dt_gamma_scale', 0.0) โ”‚
โ”‚ 209 โ”‚ โ”‚ # (num_scenes,) โ”‚
โ”‚ 210 โ”‚ โ”‚ dt_gamma = dt_gamma_scale / cond_intrinsics[..., :2].mean(dim=(-2, -1)) โ”‚
โ”‚ โ”‚
โ”‚ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:65 in get_cam_rays โ”‚
โ”‚ โ”‚
โ”‚ 62 def get_cam_rays(c2w, intrinsics, h, w): โ”‚
โ”‚ 63 โ”‚ directions = get_ray_directions( โ”‚
โ”‚ 64 โ”‚ โ”‚ h, w, intrinsics, norm=False, device=intrinsics.device) # (num_scenes, num_imgs โ”‚
โ”‚ โฑ 65 โ”‚ rays_o, rays_d = get_rays(directions, c2w, norm=True) โ”‚
โ”‚ 66 โ”‚ return rays_o, rays_d โ”‚
โ”‚ 67 โ”‚
โ”‚ 68 โ”‚
โ”‚ โ”‚
โ”‚ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:56 in get_rays โ”‚
โ”‚ โ”‚
โ”‚ 53 โ”‚ print("Shape of c2w:", c2w.shape) โ”‚
โ”‚ 54 โ”‚ print("Shape of directions:", directions.shape) โ”‚
โ”‚ 55 โ”‚ โ”‚
โ”‚ โฑ 56 โ”‚ rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (
, h, w, 3) โ”‚
โ”‚ 57 โ”‚ rays_o = c2w[..., None, None, :3, 3].expand(rays_d.shape) # (*, h, w, 3) โ”‚
โ”‚ 58 โ”‚ if norm: โ”‚
โ”‚ 59 โ”‚ โ”‚ rays_d = F.normalize(rays_d, dim=-1) โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
wandb: WARNING No program path found, not creating job artifact. See https://docs.wandb.ai/guides/launch/create-job

Do you have enough VRAM on your GPU? This error can occur if you are close to running out of memory.
Otherwise it looks like a CUDA bug and you should report the instance to NVIDIA.

I have 22GB, RTX 3090. I will install different CUDA version and try again. Thank you.

You may try to use the provided docker image.