Model trained on custom model not working after converting to engine.
rlrahulkanojia opened this issue · 6 comments
Hello, the repo works great and is able to convert without model without accuracy drop.
When I tried to train a model using yolov5 repo ( a80dd66efe0bc7fe3772f259260d5b7278aab42f ) the loss function throws an error and when I use the model trained using v6.1 of ultralytics yolov5 repo, the model output nan values but only when object is present.
How can train a model on custom data and deploy that to an engine file without having nan as confidence of output.
Thanks
Rahul
When I tried to train a model using yolov5 repo ( a80dd66efe0bc7fe3772f259260d5b7278aab42f ) the loss function throws an error and when I use the model trained using v6.1 of ultralytics yolov5 repo, the model output nan values but only when object is present.
Do you mean you train with pretrained V6.1 model and the loss function throws an error ? Feels like it is related to training pipeline instead of inference?
Yes, the repo gives the training issue but I fixed that with new changes from repo and training converged sucessfully but after converting the model, I was getting detection but with Nan confidence.
So my questions:
- How do you fix the issue ? Did you change the model architecture, bboxes parsing method or something else ?
- What precision did you use for trt inference ? Did you try fp32 precision ?
- The main issue was in loss function defination which was also in main repo. So, I just replaced the class defination of ClassComputeLoss from next working commit from (a80dd66) and it worked.
- Precision used for trt inference was fp16. Haven't tried fp32 precision yet. Will update the comment few hours wfter trying it out.
Reference:
BTW, did you change the classes number to your custom dataset here: https://github.com/NVIDIA-AI-IOT/yolov5_gpu_optimization/blob/main/0001-Enable-onnx-export-with-batchNMS-plugin.patch#L155 ?
Ah, got it. Thanks for letting me know. Works like a charm.