yasenh/libtorch-yolov5

Python and libtorch model prediction results are inconsistent

blueskywwc opened this issue · 8 comments

Hello, I have updated the version of YOLOv5 (4.0). I found that the prediction results of the python model are a little different from the results predicted by the libtorch model. The prediction results of the 3.1 version are the same. What is the reason? Can you help me, thank you!

@blueskywwc I didn't try the latest yolov5 version(4.0), did you export new torchscript models?

Yes, I exported a new model, I used all versions of yolov5s.pt, the same training parameters, but the size of the trained model is inconsistent: 14.6M(v4.0) 15.0M(v3.1)
The model structure of the new version has changed, so the output of libtorch does not match the prediction result of v4.0. I am not sure where to modify it. I want to ask for your help, thank you!

@blueskywwc Have you solved this problem,with the same confidence and iou threshold ,libtorch version predicted very different result,but as I tried to input a 1x1x640x640 tensor initialize with 1.0 into python model and libtorch model,output tensor is consistent,is there any post processing procedure changed ?

@xuebuaa I have solved this problem ,You need to rewrite LetterboxImage to be consistent with the python version ,this is the result of my rewriting:

vector Detector::LetterboxImage(const cv::Mat& src, cv::Mat& dst, const cv::Size& input_size)
{
auto src_h = static_cast(src.rows);
auto src_w = static_cast(src.cols);

float in_h = input_size.height;
float in_w = input_size.width;

float scale = min(in_w / src_w, in_h / src_h);

int mid_h = static_cast<int>(round(src_h * scale));
int mid_w = static_cast<int>(round(src_w * scale));

int dw=in_w-mid_w;
int dh=in_h-mid_h;
float p_w = dw%32/2.0;
float p_h = dh%32/2.0;

cv::resize(src, dst, cv::Size(mid_w, mid_h));

int top = (int)round(p_h-0.1) ;
int bottom = (int)round(p_h+0.1);
int left = (int)round(p_w-0.1);
int right = (int)round(p_w+0.1);

cv::copyMakeBorder(dst, dst, top, bottom, left, right, cv::BORDER_CONSTANT, cv::Scalar(114, 114, 114));

vector<float> pad_info{static_cast<float>(left), static_cast<float>(top), scale};
return pad_info;

}

@blueskywwc ,thanks for replying, there really exists some difference comparing to python version letterbox function,which used for pading image;I tried yours with input size 2592x1944,but the output size is 640x480,not fitting the required size 640x640;While debuging, the parameter top and bottom in fact is zero for cv::copyMakeBorder ;And I don't really understand why letterbox use Mod 32 to calculate the padding height and width.
image

@blueskywwc 好的多谢!

Hello!

Thank you yasenh for your implementation of yolov5! My response is a little late but I hope to help anyone facing this issue in the future.

I have also noticed some inconsistencies in the model predictions between your implementation and Ultralytic's Python version. In my case, some predictions made in the Python version were missed in this c++ implementation.

I believe the issue is in the LetterBox function.

In the Python version, it uses both INTER_AREA and INTER_LINEAR interpolation methods based on the image size ratio. In contrast, the c++ implementation only uses INTER_LINEAR by default. Changing this to match the Python code fixed my issue.