Negative values for memory overhead when testing on smaller networks

Question

Negative values for memory overhead when testing on smaller networks

EliasVansteenkiste opened this issue 4 years ago · 5 comments

EliasVansteenkiste commented 4 years ago

When I test torch-scan on small networks, I get negative values for Framework & CUDA overhead and Total RAM usage.

Any idea how to fix it?

Thanks in advance

 Trainable params: 47,073
Non-trainable params: 0
Total params: 47,073
---------------------------------------------------------------------------------------------
Model size (params + buffers): 0.18 Mb
Framework & CUDA overhead: -24.64 Mb
Total RAM usage: -24.46 Mb
---------------------------------------------------------------------------------------------
Floating Point Operations on forward: 67.61 MFLOPs
Multiply-Accumulations on forward: 34.70 MMACs
Direct memory accesses on forward: 34.57 MDMAs

Answer 1 · 2021-02-16T21:14:27.000Z

Hi @EliasVansteenkiste !

Thanks for reporting this! This is odd, could you specify a minimal code snippet to reproduce this please? (architecture included)

I suspect the RAM overhead computation failed because of an issue with nvidia-smi. But I'd need to be able to reproduce the error to investigate 🙏

Answer 2 · 2021-03-03T16:09:47.000Z

@EliasVansteenkiste any update?

Answer 3 · 2021-03-21T18:46:32.000Z

ping @EliasVansteenkiste 🙏

Answer 4 · 2021-07-11T17:02:25.000Z

Hello @EliasVansteenkiste 👋
It's been quite a while, if I can't reproduce the error, I cannot do much. Would you mind sharing how to reproduce it?

Answer 5 · 2022-05-31T16:06:28.000Z

I'm closing this issue since I don't have any way of reproducing this unfortunately :/
@EliasVansteenkiste if you have time at some point, please post more details 🙏