Reduce memory consumption to prevent errors due to container OOM

Question

Reduce memory consumption to prevent errors due to container OOM

minimaxir opened this issue 5 years ago · 2 comments

Containers seem to go OOM after ~10 generations, despite garbage collection. Loading the model takes up ~1.5GB so hitting the ceiling is not surprising, but there should be a way to control the leaks.

Answer 1 · 2019-06-25T16:29:52.000Z

The current implementation (reload model after 8 generations) appears to avoid OOMs.

Answer 2 · 2019-10-02T19:41:00.000Z

this will reduce the memory consumption by a lot tensor2tensor.utils.adafactor.adafactoroptimizer