Model trained on custom model not working after converting to engine.

Question

Model trained on custom model not working after converting to engine.

rlrahulkanojia opened this issue 2 years ago · 6 comments

Hello, the repo works great and is able to convert without model without accuracy drop.

When I tried to train a model using yolov5 repo ( a80dd66efe0bc7fe3772f259260d5b7278aab42f ) the loss function throws an error and when I use the model trained using v6.1 of ultralytics yolov5 repo, the model output nan values but only when object is present.

How can train a model on custom data and deploy that to an engine file without having nan as confidence of output.

Thanks
Rahul

Answer 1 · 2022-08-15T03:16:57.000Z

When I tried to train a model using yolov5 repo ( a80dd66efe0bc7fe3772f259260d5b7278aab42f ) the loss function throws an error and when I use the model trained using v6.1 of ultralytics yolov5 repo, the model output nan values but only when object is present.

Do you mean you train with pretrained V6.1 model and the loss function throws an error ? Feels like it is related to training pipeline instead of inference?

Answer 2 · 2022-08-16T07:08:38.000Z

Yes, the repo gives the training issue but I fixed that with new changes from repo and training converged sucessfully but after converting the model, I was getting detection but with Nan confidence.

Answer 3 · 2022-08-16T07:14:01.000Z

So my questions:

How do you fix the issue ? Did you change the model architecture, bboxes parsing method or something else ?
What precision did you use for trt inference ? Did you try fp32 precision ?

Answer 4 · 2022-08-16T07:28:28.000Z

The main issue was in loss function defination which was also in main repo. So, I just replaced the class defination of ClassComputeLoss from next working commit from (a80dd66) and it worked.
Precision used for trt inference was fp16. Haven't tried fp32 precision yet. Will update the comment few hours wfter trying it out.

Reference:

ultralytics/yolov5#8644

Answer 5 · 2022-08-16T10:07:19.000Z

BTW, did you change the classes number to your custom dataset here: https://github.com/NVIDIA-AI-IOT/yolov5_gpu_optimization/blob/main/0001-Enable-onnx-export-with-batchNMS-plugin.patch#L155 ?

Answer 6 · 2022-08-16T10:52:41.000Z

Ah, got it. Thanks for letting me know. Works like a charm.