MONet does not save the memory used by PyTorch
merrymercy opened this issue · 2 comments
Hi, thanks for the awesome library.
How do you measure the used memory? Is it emperically measured or is it theorically computed?
I measured the memory usage by nvidia-smi
and found MONet does not save the memory used by PyTorch.
First, I run the 10GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_10.00.pkl
. The peak memory reported by nvidia-smi
is around 12GB.
Then, I run the 6GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_6.00.pkl
. The peak memory reported by nvidia-smi
is still around 12GB.
How to use MONet to actually save the memory used by PyTorch?
Thanks for the comments.
We measured the used memory using torch.cuda.memory_allocated
. This gives us the total memory used by the tensors in PyTorch. nvidia-smi
, on the other hand, shows the total GPU memory used by the system. The difference between the two is because PyTorch's caching memory allocator does not release GPU memory to the system after tensors are deallocated. While this design works well for DL workloads in general, it is not great for checkpointing because the memory of the deallocated tensors needs to be explicitly returned to the system in most cases.
We noticed that allocating a tensor pool the size of the expected memory usage goes a long way in bringing the actual system memory used close to the total memory used by PyTorch tensors. This can be done from the Python code by adding the following lines before the training loop:
pool = torch.zeros(expected_memory/4).cuda()
del pool
Added an explanation about this in the README. Closing this issue now.