CUDA memory usage continuously increases
vlfom opened this issue ยท 3 comments
Dear authors,
Thank you for the great work and clean code.
I am using the CenterNet2 default configuration (from Base-CenterNet2.yaml), however, when training, I observe that the memory reserved by CUDA keeps increasing until the training fails due to CUDA OOM error. When I replace the CenterNet2 with the default RPN, the issue disappears.
I tried adding gc.collect()
and torch.cuda.empty_cache()
to the training loop with no success.
Have you noticed such behavior in the past, or could you please provide some hints on what could be the issue? Below I also provide some reference screenshots.
Note: in my project, there are several things that differ from the abovementioned configuration: I train on 50% of COCO dataset and I use LazyConfig to initialize the model. However, I reimplemented the configuration twice and both face the same issue, so it is unlikely there is a bug in my code.
(observe that memory allocation keeps increasing on both images)
Hi!
I am facing the same issue. I tried replacing the CustomCascadeROIHeads with the StandardROIHeads, trying to confirm if the problem, but the same problem persists. I have the feeling that the problem is in CenterNet, but I still was not not able to pinpoint where.
I've encountered this issue as well. It seems to happen with the two-stage CenterNet2 models. The workaround that I've found is running the model with the following versions: detectron2=v0.6, pytorch=1.8.1, python=3.6, and cuda=11.1
Thank you! ๐ It seemed to have solved the problem here as well!