jfzhang95/pytorch-deeplab-xception

RuntimeError: CUDA error: device-side assert triggered

ycc66104116 opened this issue · 4 comments

hi, recently i use my own dataset and run Deeplab V3+, but i got the error.
i think this is about the classes but i sure that my classes is 6+1(background), and i have changed the number in utils.py and .py. and i really don't know how to fix it. does anyone know this? i will very appreciate if anyone can help me fix this problem.

BTW. i used to run successfully with 2 classes, but when i used other dataset and change to 7, it run out the error.
and .py is modified from pascal.py, i only change the class name, num classes and base dir to my dataset. other parts maintain the same as pascal.py.

--------------my error message-----------
C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\NLLLoss2d.cu:95: block: [0,0,0], thread: [708,0,0] Assertion t >= 0 && t < n_classes failed.
C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\NLLLoss2d.cu:95: block: [0,0,0], thread: [709,0,0] Assertion t >= 0 && t < n_classes failed.
C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\NLLLoss2d.cu:95: block: [0,0,0], thread: [710,0,0] Assertion t >= 0 && t < n_classes failed.
...
Traceback (most recent call last):
File "train.py", line 388, in
main()
File "train.py", line 374, in main
trainer.training(epoch)
File "train.py", line 134, in training
loss = self.criterion(output, target)
File "N:\pytorch-deeplab-xception-master\utils\loss.py", line 28, in CrossEntropyLoss
loss = criterion(logit, target.long())
File "C:\Users\LOC\anaconda3\envs\envfordeeplab1229\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\LOC\anaconda3\envs\envfordeeplab1229\lib\site-packages\torch\nn\modules\loss.py", line 1152, in forward
label_smoothing=self.label_smoothing)
File "C:\Users\LOC\anaconda3\envs\envfordeeplab1229\lib\site-packages\torch\nn\functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered

I have met the same questions as you.Would you deal with it yet? One month ago,I can run it with no problem. But this time,when I git it again, it reported this ERROR, that's so strange.

yes i can run the code now, however i haven't try multi classes yet. now my data only contains 1 kind target and background.
i processed my label data as indices, which means only 0 and 1 (cause only 2 classes now), the two value in the label image.

I have met the same questions as you.Would you deal with it yet? One month ago,I can run it with no problem. But this time,when I git it again, it reported this ERROR, that's so strange.

hey, did you fix the problem?

I have met the same questions as you.Would you deal with it yet? One month ago,I can run it with no problem. But this time,when I git it again, it reported this ERROR, that's so strange.

hello,did you fix the problem?