training stuck

Question

training stuck

passingdragon opened this issue 16 days ago · 3 comments

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I used your framework to modify yolo and define a model by myself, but when I was training, why did I get stuck at the beginning, in the position shown below

Additional

No response

Answer 1 · 2024-12-06T06:36:37.000Z

👋 Hello @passingdragon, thank you for your interest in YOLOv5 🚀! It seems like you're facing an issue with training when using a custom-defined model.

If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE)? This would help us understand and debug the issue much faster. Include details such as the code changes you've made, the exact command you used to start training, and a snippet of the terminal output where it gets stuck.

Additionally, confirm that your environment matches the expected requirements, particularly Python versions (>=3.8.0) and PyTorch versions (>=1.8), and that all dependencies are installed correctly.

YOLOv5 can be run in several environments, including local setups, Google Colab, Docker containers, AWS, or GCP. If relevant, let us know the environment you're using, as well as if you are leveraging a GPU for training.

This is an automated response to assist you as quickly as possible. An Ultralytics engineer will also review your issue and provide further help soon. 😊

Answer 2 · 2024-12-06T06:38:58.000Z

I can run the source code successfully, but the training gets stuck after I modify the model

Answer 3 · 2024-12-06T10:43:24.000Z

@passingdragon thank you for your question. When encountering training issues after modifying the model, the problem often lies in the custom changes. Please check the following:

Ensure your modified model structure aligns with the dimensions and outputs expected by YOLOv5, especially for layers like detection heads.
Verify your .yaml configuration for the new model (e.g., nc for number of classes and anchors).
Debug by running with --verbose to identify specific points where it stalls.
Confirm that dependencies are up-to-date and test with the unmodified base model to ensure the issue is not external.

Let us know if you need further clarification or encounter specific errors!