JiachengLi1995/Recformer

Issue with Training Progress Stalling

hyc9 opened this issue · 0 comments

hyc9 commented

Thanks for your great work, however, I encountered one issue when running the pretraining code on my data, hope to get your assistant!
Issue Description:
When running pre-trained code, the training process stalls after a single validation step. The progress bar freezes, and new models or training results aren't saved. This occurs before completing a full training epoch. Notably, GPU usage remains consistently at 100%.

Environment:

  • GPU: 8x Nvidia RTX 3090

Your assistance in resolving this issue is greatly appreciated.