tianyu0207/PEBAL

RuntimeError: CUDA error: device-side assert triggered

Runist opened this issue · 3 comments

what problem for it?

train log is:
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [20430,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. epoch (0) | gambler_loss: 10.962 energy_loss: 7.645 : 50%|███████████████████████▌ | 1/2 [00:21<00:21, 21.57s/it] Traceback (most recent call last): File "code/main.py", line 151, in <module> main(-1, 1, config=config, args=args) File "code/main.py", line 107, in main optimizer=optimizer) File "/home/rhdai/workspace/code/PEBAL/code/engine/trainer.py", line 51, in train loss = self.loss1(pred=in_logits, targets=in_target, wrong_sample=False) File "/home/rhdai/miniconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/rhdai/workspace/code/PEBAL/code/losses.py", line 103, in forward gambler_loss = gambler_loss[~mask].log() RuntimeError: CUDA error: device-side assert triggered wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [52,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [53,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [54,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [55,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [56,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [19955,0,0], thread: [57,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"f ailed. wandb: wandb: wandb: Run history: wandb: energy_loss ▁ wandb: gambler_loss ▁ wandb: global_step ▁▁ wandb: wandb: Run summary: wandb: energy_loss 7.64503 wandb: gambler_loss 10.96237 wandb: global_step 0 wandb: wandb: Synced your_pebal_exp: https://wandb.ai/runist/OoD_Segmentation/runs/359y48y4

File

Sorry, I didn't encounter such a problem before. Could you provide more details for me to assist you? Thanks.

File

Sorry, I didn't encounter such a problem before. Could you provide more details for me to assist you? Thanks.

I have slove it. It because i didn't use your method to prepare the copy data. I just pass the voc data path.

But I have new problem, I will open the new issus.