Traning code freezed when saving best check point
aemior opened this issue · 0 comments
aemior commented
I use 4x4090GPU and mobileone backbone with batch size 50 to train nanodet, the memory per gpu is about 23.3GB, but the training code is freeze when save the best check point at this line:
nanodet/nanodet/trainer/task.py
Line 273 in 4d85d0c
Any ideal to debug?