wangkuiyi/gotorch

GPU memory profiling

Opened this issue · 2 comments

We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.

The GPU card is P100 with 16G memory.

Experiment 1:

Following is the result, it's measured with nvidia-smi command.

Only Forward Forward and Backward
PyTorch 3719 MiB 2545 MiB
GoTorch 2447 MiB 2767 MiB

We remove three-line codes in Only Forward scenario:

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

Experiment 2:

GPU memory with different batch size:

Batch Size 16 128 160
PyTorch 2545 MiB 13161 MiB 15295 MiB
GoTorch 2767 MiB 14755 MiB OOM

From this answer https://discuss.pytorch.org/t/how-to-delete-pytorch-objects-correctly-from-memory/947, it seems that GPU memory consumption value with nvidia-smi is not accurate.

  • We can use torch.cuda.max_memory_allocated() to get the actual GPU memory occupied by tensors only.
  • We can use torch.cuda.empty_cache() to release the other memory that is not occupied by the tensors but the auto-growth caching allocators. In this way, the memory consumption value with nvidia-smi would be accurate.