tensorflow/text

GPU Memory Error

sac3tf opened this issue · 0 comments

Hello,

When attempting to run this notebook, I consistently received a "cpu to gpu memory limit exceeded" on Google colab with a GPU allocated (free account) - and my entire colab notebook session would crash. If I connected to an instance without a GPU (CPU only), then the notebook would run successfully but the training would take way too long. I ended up paying for a colab pro membership in order to get access to a "premium" GPU and allocated more RAM...then I was able to run the notebook with a GPU without failure.

We are facing a similar issue to the one described above when trying to follow this notebook in an internal environment leveraging Sagemaker. Can someone provide some helpful tips on getting around GPU memory errors in general with this notebook? We are using batch size of 32 with each item in the batch having roughly 115 tokens. Is lowering the batch size the only way to try to counter the memory error? We have not changed anything significant in this notebook other than the data we use it on, but because I faced a similar error just trying to run this notebook as-is in colab, I am wondering if something can be done to try to make this notebook more GPU efficient. Thank you!