CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

Question

CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

Closed this issue 8 months ago · 3 comments

Hey, I am using wsl with ubuntu 20.04, cuda 11.8. I got the following when I tried to run: python zerorf.py --load-image=examples/ice.png

Is there any way I can try?

wandb: Currently logged in as: flandre. Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /mnt/c/Users/msz/Documents/Github_projs/zerorf/results/test/wandb/run-20231228_145824-hufokbtz
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run test
wandb: ⭐️ View project at https://wandb.ai/flandre/zerorf
wandb: 🚀 View run at https://wandb.ai/flandre/zerorf/runs/hufokbtz
0%| | 0/10000 [00:00<?, ?it/s]2023-12-28 14:58:30,782 - mmgen - INFO - Initialize codes from scratch.
Shape of c2w: torch.Size([1, 6, 4, 4])
Shape of directions: torch.Size([1, 6, 320, 320, 3])
0%| | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "zerorf.py", line 227, in
lv = nerf.train_step(data_entry, optim)['log_vars']
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py", line 207, in train_step
cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 65, in get_cam_rays
rays_o, rays_d = get_rays(directions, c2w, norm=True)
File "/mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py", line 56, in get_rays
rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (, h, w, 3)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /mnt/c/Users/msz/Documents/Github_projs/zerorf/zerorf.py:227 in │
│ │
│ 224 best_psnr = 0.0 │
│ 225 │
│ 226 for j in prog: │
│ ❱ 227 │ lv = nerf.train_step(data_entry, optim)['log_vars'] │
│ 228 │ lr_sched.step() │
│ 229 │ lv.pop('code_rms') │
│ 230 │ lv.pop('loss') │
│ │
│ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/models/autoencoders/multiscene_nerf.py:207 in │
│ train_step │
│ │
│ 204 │ │ │
│ 205 │ │ num_scenes, num_imgs, h, w, _ = cond_imgs.size() │
│ 206 │ │ # (num_scenes, num_imgs, h, w, 3) │
│ ❱ 207 │ │ cond_rays_o, cond_rays_d = get_cam_rays(cond_poses, cond_intrinsics, h, w) │
│ 208 │ │ dt_gamma_scale = self.train_cfg.get('dt_gamma_scale', 0.0) │
│ 209 │ │ # (num_scenes,) │
│ 210 │ │ dt_gamma = dt_gamma_scale / cond_intrinsics[..., :2].mean(dim=(-2, -1)) │
│ │
│ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:65 in get_cam_rays │
│ │
│ 62 def get_cam_rays(c2w, intrinsics, h, w): │
│ 63 │ directions = get_ray_directions( │
│ 64 │ │ h, w, intrinsics, norm=False, device=intrinsics.device) # (num_scenes, num_imgs │
│ ❱ 65 │ rays_o, rays_d = get_rays(directions, c2w, norm=True) │
│ 66 │ return rays_o, rays_d │
│ 67 │
│ 68 │
│ │
│ /mnt/c/Users/msz/Documents/Github_projs/zerorf/lib/core/utils/nerf_utils.py:56 in get_rays │
│ │
│ 53 │ print("Shape of c2w:", c2w.shape) │
│ 54 │ print("Shape of directions:", directions.shape) │
│ 55 │ │
│ ❱ 56 │ rays_d = directions @ c2w[..., None, :3, :3].transpose(-1, -2) # (, h, w, 3) │
│ 57 │ rays_o = c2w[..., None, None, :3, 3].expand(rays_d.shape) # (*, h, w, 3) │
│ 58 │ if norm: │
│ 59 │ │ rays_d = F.normalize(rays_d, dim=-1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
wandb: WARNING No program path found, not creating job artifact. See https://docs.wandb.ai/guides/launch/create-job

Answer 1 · 2024-01-02T02:42:58.000Z

Do you have enough VRAM on your GPU? This error can occur if you are close to running out of memory.
Otherwise it looks like a CUDA bug and you should report the instance to NVIDIA.

Answer 2 · 2024-01-02T15:14:18.000Z

I have 22GB, RTX 3090. I will install different CUDA version and try again. Thank you.

Answer 3 · 2024-01-02T21:48:50.000Z

You may try to use the provided docker image.