frgfm/torch-scan

Negative values for memory overhead when testing on smaller networks

EliasVansteenkiste opened this issue ยท 5 comments

When I test torch-scan on small networks, I get negative values for Framework & CUDA overhead and Total RAM usage.

Any idea how to fix it?

Thanks in advance

 Trainable params: 47,073
Non-trainable params: 0
Total params: 47,073
---------------------------------------------------------------------------------------------
Model size (params + buffers): 0.18 Mb
Framework & CUDA overhead: -24.64 Mb
Total RAM usage: -24.46 Mb
---------------------------------------------------------------------------------------------
Floating Point Operations on forward: 67.61 MFLOPs
Multiply-Accumulations on forward: 34.70 MMACs
Direct memory accesses on forward: 34.57 MDMAs
frgfm commented

Hi @EliasVansteenkiste !

Thanks for reporting this! This is odd, could you specify a minimal code snippet to reproduce this please? (architecture included)

I suspect the RAM overhead computation failed because of an issue with nvidia-smi. But I'd need to be able to reproduce the error to investigate ๐Ÿ™

frgfm commented

ping @EliasVansteenkiste ๐Ÿ™

frgfm commented

Hello @EliasVansteenkiste ๐Ÿ‘‹
It's been quite a while, if I can't reproduce the error, I cannot do much. Would you mind sharing how to reproduce it?

frgfm commented

I'm closing this issue since I don't have any way of reproducing this unfortunately :/
@EliasVansteenkiste if you have time at some point, please post more details ๐Ÿ™