Training chromBPNet takes too much time

Question

Training chromBPNet takes too much time

lzj1769 opened this issue 9 months ago · 4 comments

Hi,

I am currently training a chromBPNet for an ATAC-seq sample with ~200K peaks. However, it takes ~12 hours for one epoch.
See below screenshot

So I want to ask how to make the training process faster.

Any ideas are appreciated.

Thanks,
Zhijian

Answer 1 · 2023-12-09T02:22:51.000Z

Are you using a GPU? What type of GPU? It should not take so long if you are using a decent GPU eg V100 or A100 Anshul

…

On Fri, Dec 8, 2023, 5:39 PM Zhijian Li ***@***.***> wrote: Hi, I am currently training a chromBPNet for an ATAC-seq sample with ~200K peaks. However, it takes ~12 hours for one epoch. See below screenshot Screenshot.2023-12-08.at.20.38.24.png (view on web) <https://github.com/kundajelab/chrombpnet/assets/9947922/5af62369-fac3-4959-a1a8-d9f16dc07223> So I want to ask how to make the training process faster. Any ideas are appreciated. Thanks, Zhijian — Reply to this email directly, view it on GitHub <#161>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDWEJTRRGIGG73XK4HON3YIO6OLAVCNFSM6AAAAABANNK7JWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTGNJSG42TSNQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2023-12-09T04:46:39.000Z

Hi Anshul,

I think I found the problem.
For some weird reason, TensorFlow doesn't use GPU properly. After fixing this issue, everything looks great!

Thanks,
Zhijian

Answer 3 · 2024-04-17T14:16:28.000Z

Hi @lzj1769, I am running into a similar performance problem. Training the model from the tutorial is very slow, no output yet after more than 1 hour execution time. How did you discover TensorFlow wasn't using the GPU properly and how did you fix it?

Answer 4 · 2024-04-17T17:25:11.000Z

Hi @hermandebeukelaer

You can check if tf is using GPU using the command: nvidia-smi

To fix it, be sure you installed the correct version of CUDA, Tensorflow, and cuDNN.