Training chromBPNet takes too much time
lzj1769 opened this issue · 4 comments
lzj1769 commented
akundaje commented
Are you using a GPU? What type of GPU?
It should not take so long if you are using a decent GPU eg V100 or A100
Anshul
…On Fri, Dec 8, 2023, 5:39 PM Zhijian Li ***@***.***> wrote:
Hi,
I am currently training a chromBPNet for an ATAC-seq sample with ~200K
peaks. However, it takes ~12 hours for one epoch.
See below screenshot
Screenshot.2023-12-08.at.20.38.24.png (view on web)
<https://github.com/kundajelab/chrombpnet/assets/9947922/5af62369-fac3-4959-a1a8-d9f16dc07223>
So I want to ask how to make the training process faster.
Any ideas are appreciated.
Thanks,
Zhijian
—
Reply to this email directly, view it on GitHub
<#161>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDWEJTRRGIGG73XK4HON3YIO6OLAVCNFSM6AAAAABANNK7JWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTGNJSG42TSNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
lzj1769 commented
Hi Anshul,
I think I found the problem.
For some weird reason, TensorFlow doesn't use GPU properly. After fixing this issue, everything looks great!
Thanks,
Zhijian
hermandebeukelaer commented
Hi @lzj1769, I am running into a similar performance problem. Training the model from the tutorial is very slow, no output yet after more than 1 hour execution time. How did you discover TensorFlow wasn't using the GPU properly and how did you fix it?
lzj1769 commented
You can check if tf is using GPU using the command: nvidia-smi
To fix it, be sure you installed the correct version of CUDA, Tensorflow, and cuDNN.