About RuntimeError
small-beebee opened this issue · 2 comments
SoftPool_TRAIN train [4: 1517/3621] emd1: 0.008102 emd2: 0.004063 emd3: 0.005554 emd4: 0.109300
SoftPool_TRAIN train [4: 1518/3621] emd1: 0.007283 emd2: 0.004718 emd3: 0.005535 emd4: 0.083149
SoftPool_TRAIN train [4: 1519/3621] emd1: 0.007046 emd2: 0.005290 emd3: 0.005096 emd4: 0.093155
SoftPool_TRAIN train [4: 1520/3621] emd1: 0.006560 emd2: 0.003878 emd3: 0.003762 emd4: 0.094298
SoftPool_TRAIN train [4: 1521/3621] emd1: 0.008697 emd2: 0.004597 emd3: 0.004208 emd4: 0.093892
SoftPool_TRAIN train [4: 1522/3621] emd1: 0.004853 emd2: 0.003058 emd3: 0.005057 emd4: 0.095074
SoftPool_TRAIN train [4: 1523/3621] emd1: 0.006060 emd2: 0.003544 emd3: 0.003586 emd4: 0.098028
SoftPool_TRAIN train [4: 1524/3621] emd1: 0.006612 emd2: 0.003719 emd3: 0.004614 emd4: 0.086453
SoftPool_TRAIN train [4: 1525/3621] emd1: 0.006943 emd2: 0.003364 emd3: 0.004205 emd4: 0.111324
SoftPool_TRAIN train [4: 1526/3621] emd1: 0.008450 emd2: 0.005112 emd3: 0.004640 emd4: 0.080969
SoftPool_TRAIN train [4: 1527/3621] emd1: 0.005002 emd2: 0.002095 emd3: 0.002938 emd4: 0.086772
SoftPool_TRAIN train [4: 1528/3621] emd1: 0.004835 emd2: 0.002619 emd3: 0.002841 emd4: 0.090955
SoftPool_TRAIN train [4: 1529/3621] emd1: 0.008848 emd2: 0.003807 emd3: 0.004815 emd4: 0.116368
SoftPool_TRAIN train [4: 1530/3621] emd1: 0.009957 emd2: 0.005667 emd3: 0.005009 emd4: 0.136988
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [240,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [368,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [144,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [208,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [336,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [176,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [304,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [6,0,0], thread: [272,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [2,0,0], thread: [368,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:130: void THCudaTensor_scatterKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [2,0,0], thread: [336,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]
failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorScatterGather.cu line=194 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 212, in
loss_net.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:194
First of all thank you for your excellent contribution. At present, I have been able to successfully train on the data set shapnet, but the above problems occurred during the training process. I have spent a lot of time and have not been able to successfully solve this problem. I hope you can answer! Thank you!
My operating environment is as follows:
CUD:A10.0
torch:1.12.0
Hi can you try to finetuning with a pretrained model like '--model log/wo-unet_shapenet/network.pth'? Seems that the model can be trained initially
Thanks!