bentrevett/pytorch-sentiment-analysis

Upgraded Sentiment Analysis - Model train cell seems never end

Arthur-Zhong opened this issue · 3 comments

Hi, I run the script both on a local machine with RTX 2080 and Google lab cell by cell, but everytime I run the model training cell (the one with 'N_EPOCHS = 5'), it just keeps running but never pops out any output even after 30 mins. However, the results in tutorial shows only 30 sec for training 1 epoch.

I've just ran the notebook on Google Colab and the training cell manages to run fine, taking around ~41 seconds per epoch.

Are you sure you are using the GPU in Colab? You need to go Runtime > Change runtime type and then change Hardware accelerator to GPU and Runtime shape to High RAM.

I'll check on my desktop GPU later today.

I've just ran the notebook on Google Colab and the training cell manages to run fine, taking around ~41 seconds per epoch.

Are you sure you are using the GPU in Colab? You need to go Runtime > Change runtime type and then change Hardware accelerator to GPU and Runtime shape to High RAM.

I'll check on my desktop GPU later today.
==========================================================
I tried this on Colab and it works! Thank you! However, on the local side the issue seems still there. I monitored the 'CUDA' indicator of GPU when training model, the CUDA usage is 100%. I tested my local environment by running the script of tutorial 1 and it worked as expected. Not sure why the second tutorial doesn't work on my local machine.

I've also found that the notebook works fine on my desktop GPU.

Are you definitely using your GPU? Make sure to print the device returned by torch.device(...) and ensure that it prints "cuda".