CUDA error: misaligned address
DiantaoTu opened this issue · 3 comments
Recently, I used RoMa to convert network output to a rotation matrix. The network outputs 9 numbers and I use roma.procrustes
to convert it to a 3x3 rotation matrix. However, I met this CUDA error
File "/home/tdt/PanoDepth/geometry.py", line 114, in transformation_from_parameters
rot_matrix = roma.procrustes(rotation, force_rotation=True)
File "/home/tdt/anaconda3/envs/panodepth/lib/python3.8/site-packages/roma/mappings.py", line 77, in procrustes
R, DS = _ProcrustesManualDerivatives.apply(M, force_rotation, regularization,gradient_eps)
File "/home/tdt/anaconda3/envs/panodepth/lib/python3.8/site-packages/roma/mappings.py", line 22, in forward
flip = (torch.det(U) * torch.det(V) < 0)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To further debug the problem, I added CUDA_LAUNCH_BLOCKING=1 in my command line, resulting in CUDA_LAUNCH_BLOCKING=1 python train.py
.
The error disappeared, and the training process was OK. But adding CUDA_LAUNCH_BLOCKING=1
makes the training less efficient.
So, how could I solve the problem?
My environment:
Ubuntu 20.04
CUDA 11.6
pytorch 1.12.1
RoMa 1.4.0
Some users also reported me it could be an hidden memory error, and that reducing the batch size helped to make this error disappear.
Thanks for the reply. I also find this error only appears in the training process and inference is OK. And it only occurs in a particular dataset. So I just skip those data.