CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
Closed this issue ยท 3 comments
Hey, I am using wsl with ubuntu 20.04, cuda 11.8. I got the following when I tried to run: python zerorf.py --load-image=examples/ice.png
Is there any way I can try?
wandb: Currently logged in as: flandre. Use wandb login --relogin
to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /mnt/c/Users/msz/Documents/Github_projs/zerorf/results/test/wandb/run-20231228_145824-hufokbtz
wandb: Run wandb offline
to turn off syncing.
wandb: Syncing run test
wandb: โญ๏ธ View project at https://wandb.ai/flandre/zerorf
wandb: ๐ View run at https://wandb.ai/flandre/zerorf/runs/hufokbtz
0%| | 0/10000 [00:00<?, ?it/s]2023-12-28 14:58:30,782 - mmgen - INFO - Initialize codes from scratch.
Shape of c2w: torch.Size([1, 6, 4, 4])
Shape of directions: torch.Size([1, 6, 320, 320, 3])
0%| | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "zerorf.py", line 227, in
lv = nerf.train_step(data_entry, optim)['log_vars']
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py", line 207, in train_step
cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 65, in get_cam_rays
rays_o, rays_d = get_rays(directions, c2w, norm=True)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 56, in get_rays
rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (, h, w, 3)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ /mnt/c/Users/msz/Documents/Github_projs/zerorf/zerorf.py:227 in โ
โ โ
โ 224 best_psnr = 0.0 โ
โ 225 โ
โ 226 for j in prog: โ
โ โฑ 227 โ lv = nerf.train_step(data_entry, optim)['log_vars'] โ
โ 228 โ lr_sched.step() โ
โ 229 โ lv.pop('code_rms') โ
โ 230 โ lv.pop('loss') โ
โ โ
โ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py:207 in โ
โ train_step โ
โ โ
โ 204 โ โ โ
โ 205 โ โ num_scenes, num_imgs, h, w, _ = cond_imgs.size() โ
โ 206 โ โ # (num_scenes, num_imgs, h, w, 3) โ
โ โฑ 207 โ โ cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w) โ
โ 208 โ โ dt_gamma_scale = self.train_cfg.get('dt_gamma_scale', 0.0) โ
โ 209 โ โ # (num_scenes,) โ
โ 210 โ โ dt_gamma = dt_gamma_scale / cond_intrinsics[..., :2].mean(dim=(-2, -1)) โ
โ โ
โ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:65 in get_cam_rays โ
โ โ
โ 62 def get_cam_rays(c2w, intrinsics, h, w): โ
โ 63 โ directions = get_ray_directions( โ
โ 64 โ โ h, w, intrinsics, norm=False, device=intrinsics.device) # (num_scenes, num_imgs โ
โ โฑ 65 โ rays_o, rays_d = get_rays(directions, c2w, norm=True) โ
โ 66 โ return rays_o, rays_d โ
โ 67 โ
โ 68 โ
โ โ
โ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:56 in get_rays โ
โ โ
โ 53 โ print("Shape of c2w:", c2w.shape) โ
โ 54 โ print("Shape of directions:", directions.shape) โ
โ 55 โ โ
โ โฑ 56 โ rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (, h, w, 3) โ
โ 57 โ rays_o = c2w[..., None, None, :3, 3].expand(rays_d.shape) # (*, h, w, 3) โ
โ 58 โ if norm: โ
โ 59 โ โ rays_d = F.normalize(rays_d, dim=-1) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
wandb: WARNING No program path found, not creating job artifact. See https://docs.wandb.ai/guides/launch/create-job
Do you have enough VRAM on your GPU? This error can occur if you are close to running out of memory.
Otherwise it looks like a CUDA bug and you should report the instance to NVIDIA.
I have 22GB, RTX 3090. I will install different CUDA version and try again. Thank you.
You may try to use the provided docker image.