RangiLyu/nanodet

Traning code freezed when saving best check point

aemior opened this issue · 0 comments

aemior commented

I use 4x4090GPU and mobileone backbone with batch size 50 to train nanodet, the memory per gpu is about 23.3GB, but the training code is freeze when save the best check point at this line:

self.trainer.save_checkpoint(

Any ideal to debug?