Issue with Training Progress Stalling
hyc9 opened this issue · 0 comments
hyc9 commented
Thanks for your great work, however, I encountered one issue when running the pretraining code on my data, hope to get your assistant!
Issue Description:
When running pre-trained code, the training process stalls after a single validation step. The progress bar freezes, and new models or training results aren't saved. This occurs before completing a full training epoch. Notably, GPU usage remains consistently at 100%.
Environment:
- GPU: 8x Nvidia RTX 3090
Your assistance in resolving this issue is greatly appreciated.