about the memory

Question

about the memory

Closed this issue 6 years ago · 5 comments

Hi, thanks for your source code. However, when I run the 1-shot code in two GTX 1080 Ti parallel, it still occurs "CUDA error: out of memory", what I can do is to change the numbers of "train-way" and "query", but I am afraid the results would be dropped. So, could you give me some advice when I just have two GTX1080 Ti? And how many of your GPUs when you run this source code? Thanks a lot!

Answer 1 · 2018-11-25T05:55:57.000Z

Hi. I trained the model with one GTX-1080 Ti, if I remember correctly.
And I have just improved the memory usage in train.py and test.py, you could have a try if it works better.

Answer 2 · 2018-11-25T06:50:05.000Z

Thanks for your replay! yes, you are right, and I run your source code again, and it doesn't occur any error. So I am wonder if you could tell me the details of "how you improved the memory usage" ? because my code is just imitated yours, but when I run it, it always occur "cuda error: out of memory", and I know the bug is in the for loop of
"for i, batch in enumerate(val_loader, 1):"
but when I have checked my code with yours, I can't discover any differences. I would be very appreciated if you could offer me more information of how to improve the memory usage. Thank you again!

Answer 3 · 2018-11-25T07:03:42.000Z

You are welcome.
I just added proto = None; logits = None; loss = None to the end of each loop, so that gradient graph will be cleared before next loop.

Answer 4 · 2018-11-26T05:58:58.000Z

Sorry to disturb you, but I just want to share you with my bug. I have found that what caused my GPU memory surged is in this part of my own code :
https://github.com/cyvius96/prototypical-network-pytorch/blob/master/train.py#L117
I mistakenly wrote "vl.add(loss)" without the ".item()"

Answer 5 · 2018-12-26T07:27:00.000Z

Sorry to disturb you, but I just want to share you with my bug. I have found that what caused my GPU memory surged is in this part of my own code :
https://github.com/cyvius96/prototypical-network-pytorch/blob/master/train.py#L117
I mistakenly wrote "vl.add(loss)" without the ".item()"

Thanks, that is indeed a common bug :D