Running out of RAM during execution of Train.py

Question

Running out of RAM during execution of Train.py

Outcats opened this issue 3 years ago · 5 comments

Hi,

I have tried executing the train.py script in both the Linux subsystem for Windows, as well as in a Linux VM directly but the execution gets so far and then gets killed by the OS:

I don't know much about Python unfortunately and was having issues executing it directly via Windows so have been trying the Linux VM option, is there anything that can be done/recommend to make the execution more efficient?

My system has 16GB of Ram and I have about 6 allocated for the VM as well, is that not enough?

Answer 1 · 2021-10-08T04:25:06.000Z

There seems to be memory issues possibly. When you were running it on your Linux subsystem for Windows, was the script running on your CPU or your GPU? If you have GPU, what kind of GPU is it and what are the specs?

Answer 2 · 2021-10-08T08:12:03.000Z

Hey thanks for the reply, unfortunately this laptop doesn't have a GPU so it is all CPU based. I have been trying to wrap my head around the code as I don't know a lot about Python and nothing about Machine Learning, but have worked out that I needed to drop the batch number and number of Epochs and have been able to run the train.py successfully over the course of 2.5 days using:

Batches: 9
Epoch: 5

I will close this issue down if I can, if not you can, as I have worked out what the issue is and how to get around it my end, it wasn't with your code but with my understanding, or lack of!

Answer 3 · 2021-10-08T08:12:40.000Z

Closed as issue was my end not the scripts end

Answer 4 · 2021-10-08T12:41:51.000Z

The size of batch is definitely correlated with the script getting killed, but epoch usually has no relation (unless there is a memory leak occurring overtime)

Try switching the batch size to 8, it is better to have your batch size to be a value that is a power of 2 (ex: 2^2 = 4 , 2^3 = 8 , 2^4 = 16 , 2^5 = 32 , etc)

Answer 5 · 2021-10-08T12:44:23.000Z

Good tip on the batch size, thanks! The Epoch was lowered for execution time reasons rather than performance reasons however.