Memory leak issue

Question

Memory leak issue

52hpfans opened this issue 5 years ago · 2 comments

Hello,
Thanks for your great work, I am trying to learn tensorflow and keras from your tutorial code. When I used this A2C code to tackle the "MountainCar-v0" problem, I found that the RAM occupation of this code was increased after each episode. After about 800 episodes, the code exit with OOM error (My laptop has 16GB RAM ). I guess the reason might due to the keras model api in the def train(), because some api such as model.predict & tf.convert_to_tensor will add nodes to graph, which cause graph to grow after each iteration.

I tried to use 'tf.keras.backend.Clear_session()' to mitigate this memory leak issue, it seems work but I don't know whether this code will affect the model training. Nearly all the RL codes based on tf2&keras I had seen have similar memory leak issue, so could you try to fix this issue? Thanks a lot.

Answer 1 · 2019-11-23T15:24:06.000Z

Hello,

Yep, somebody on the blog also commented on it. From what I've gathered it's a bug in TF core, and current workaround is to use model.predict_on_batch instead of model.predict.

Try to avoid using clear_session() as it indeed can cause all sorts of issues with keeping the optimization process persistent.

I'll check if there's a proper fix in the works and if not I'll update the blog post with the workaround. I'll keep the ticket open for now.

Answer 2 · 2020-01-12T11:08:59.000Z

Seems the issue is still present in TF 2.1.
I've updated the code to use model.predict_on_batch, which avoids mem leaks.