GPU memory profiling
Opened this issue · 2 comments
QiJune commented
We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.
The GPU card is P100 with 16G memory.
Experiment 1:
Following is the result, it's measured with nvidia-smi
command.
Only Forward | Forward and Backward | |
---|---|---|
PyTorch | 3719 MiB | 2545 MiB |
GoTorch | 2447 MiB | 2767 MiB |
We remove three-line codes in Only Forward scenario:
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
Experiment 2:
GPU memory with different batch size:
Batch Size | 16 | 128 | 160 |
---|---|---|---|
PyTorch | 2545 MiB | 13161 MiB | 15295 MiB |
GoTorch | 2767 MiB | 14755 MiB | OOM |
QiJune commented
From this answer https://discuss.pytorch.org/t/how-to-delete-pytorch-objects-correctly-from-memory/947, it seems that GPU memory consumption value with nvidia-smi
is not accurate.
sneaxiy commented
- We can use torch.cuda.max_memory_allocated() to get the actual GPU memory occupied by tensors only.
- We can use torch.cuda.empty_cache() to release the other memory that is not occupied by the tensors but the auto-growth caching allocators. In this way, the memory consumption value with
nvidia-smi
would be accurate.