bowang-lab/U-Mamba

How to solve the problem of experiments stalling?

YUjh0729 opened this issue · 3 comments

Hello,

When I train the model, the experiment stops at a certain epoch and doesn't continue training. The GPU usage is at 1% and the memory usage is 12GB, indicating that the experiment is still running. However, it stays stuck at the current epoch for an entire night, preventing the experiment from progressing. What could be the problem? Can you help explain this?
屏幕截图 2024-05-30 093003

Thank you.

Hi @YUjh0729 ,
I'm having the same issue as you! Were you able to solve it? Any help would be greatly appreciated.
@JunMa11, any help on this one?
Thank you!

Hi @zcyrique ,
I've tried all the solutions from the issues, but none of them resolved the issue.

Thank you @YUjh0729 for reaching out, Let wait for anyone who may have solved this issue for help.