tool for computation and memory statistics
yassouali opened this issue · 2 comments
yassouali commented
First of all thank you for this wonderful work, and for providing the implementation.
Feel free to close this issue if this is not the place to ask such questions. I really liked your computation and memory statistics, I was wandering what tool you've used to get such statistics.
Thank you very much
MendelXu commented
Please refer to here for all details.
- For memory statistics, we use
torch.cuda.max_memory_allocated()
. This function always returns the maximum GPU Memory during its lifetime, so we have to organize all modules we want to count in the increasing order of memories, make sure that the result is the actual result of the current module. - For GFlops statics, we use torchstat. However, as some operations are not included in the original repository, we add those, such as
MatMul
,AdaptiveAvgPool
. - For total time statics, we just compute the time of forwarding.Also, to get a more accurate result,we have to call
torch.cuda.synchronize()
before computing time and test it several rounds to get the average time. - For time details of each operation, we use
torch.autograd.profiler.profile()
.
I hope it's clear for you.
yassouali commented
Thank you very much for your response.