Weird memory allocation, leading to OOM easily.
wondervictor opened this issue · 1 comments
wondervictor commented
Hi, I met a weird OOM problem (might be a bug) when using torchcv
to train a semantic segmentation model. I've found that GPU memory would increase dramatically (even OOM) when starting to train a model. This problem also exists in the SFNet
.
I inserted the lines below to
torchcv/model/seg/nets/sfnet.py
Line 244 in 98c7299
print("max mem: {:.3f} GB".format(torch.cuda.max_memory_allocated()/1024/1024/1024))
torch.cuda.reset_max_memory_allocated()
And I obtained the output of the allocated memory:
# first iteration
max mem: 9.849 GB
max mem: 9.849 GB
max mem: 9.849 GB
max mem: 9.849 GB
# second iteration
max mem: 5.010 GB
max mem: 5.010 GB
max mem: 5.010 GB
max mem: 5.010 GB
# third iteration
max mem: 5.016 GB
max mem: 5.016 GB
max mem: 5.016 GB
max mem: 5.016 GB
....
GPU memory is prone to explode at the start of the training. Are there any clues about this problem?
wondervictor commented
I've solved it. CuDNN benchmark causes large memory assumption at the beginning, even leading to OOM.