BobLd/YOLOv4MLNet

Tips for modifying this for YoloV5

Transigent opened this issue · 28 comments

Hi I was excited to find this project, I hoped that YoloV4 and YoloV5 were similar enough that I could successfully use it to run my YoloV5 ONNX-exported model. But apparently it's not as simple as changing the path to the model and recompiling... :) Unfortunately I don't know a lot about the key model properties that you use in this code.

The first thing I noted was that Netron reports different shapes for the inputs and outputs. The YoloV4 model input is shaped
Input { 1, 416, 416, 3 } and outputs { 1, 52, 52, 3, 85 }, { 1, 26, 26, 3, 85 }, { 1, 13, 13, 3, 85 }
and the YoloV5 model (YOLOV5l) is shaped:
Input { 1, 3, 640, 640 } and outputs { 1, 3, 80, 80, 19 }, { 1, 3, 40, 40, 19 }, { 1, 3, 20, 20, 19 }

I changed the code to reflect these differences along with the column names for the inputs and outputs.

I also changed the anchor figures according to what I found in lines 8-10 under anchors here: https://github.com/ultralytics/yolov5/blob/master/models/yolov5l.yaml
as well as the SHAPES constants.

I changed all references to 416 to 640 as thats the default pixel dimension.

It appeared to run but at the line
var results = predict.GetResults(classesNames, 0.3f, 0.7f);
I got 11,000 results. Clearly my changes were not enough.

I havent changed the XYSCALE constants as I am not sure what this is.

Do you have any thoughts about how I might get this to work? It it actually likely to work at all, ie. is YoloV5 too different to work with this code at all?

Thanks for any tips.

BobLd commented

Hi @Transigent, do you have an onnx version of the model?

Hi BobLd

Thanks very much for the response, I really appreciate it!

Here is an ONNX export of the original YOLOV5l model pretrained on COCO with the default 80 classes. Unlike my version with 14 classes it will work with everyday images, plus I haven't contaminated it with training.
https://drive.google.com/drive/folders/18PpGCnQ4Ca4vRMBQbvSta3chqtuh2zXV?usp=sharing

Let me know if you need anything else.

BobLd commented

Thanks @Transigent.

A first comment concerning the input size: because the color dimension appears first in your model ( { 1, *3*, 640, 640 } ), I think you will need to set the interleavePixelColors to false to have something like this:
.Append(mlContext.Transforms.ExtractPixels(outputColumnName: "images", scaleImage: 1f / 255f, interleavePixelColors: false))

Also, can you give the link to the part where the image is pre-processed in the python library? I need it in order to understand if you need to scale the image by 1f / 255f or not.

I'll have a look soon at the rest.

EDIT: same remark for the output layers, the color channel comes first. I think the offset will need to be changed accordingly.

Great, once again thanks so much for your thoughts. I will sit down with the code after work tomorrow when I get a second, and try to see how it works. I'm having trouble understanding it at a glance.

The closest things that might have relevance are here and here

Also the anchor constants are here

Again thanks for all your time!

BobLd commented

@Transigent Any progress on your side? If not, I'll try to have a look soon

Hi @BobLd , sorry I haven't been in touch, we had to farewell our cat due to illness. It has been difficult. When the situation permits I will get back to this.

Hi, @BobLd, currently I'm struggling to do the same as @Transigent was trying to do:(
I'm using yolov4-tiny which is trained on my custom dataset. I've converted this model to onnx format by using a python script which I grabbed from this repository https://github.com/Tianxiaomo/pytorch-YOLOv4. I don't really understand how VectorType attr is working and googling wasn't helped me at all. For example, my model output shape is [1,2535,1,4], how this will be represented in a one-dimensional array?
I would really appreciate your help and if needed I'll move this topic to another isssue

BobLd commented

Hi @Awakawaka,

You need your output as a float[]. Try creating a predictionEngine like I did here.

Then, if your output is of size [1, 2535, 1, 4], you can basically skip the 1s, leaving you with a 2D array of shape [2535, 4]. ML.Net will give you a 1D array representation of size 2535 x 4 = 10140.

My guess is that the size 4 is for a single bounding box (4 elements: x1, y1, x2, y2), and you have 2535 bounding boxes.

Try the following:

// output is your model's 1D prediction array of size 2535 * 4 = 10140
List<(float, float, float, float)> bboxes = new List<(float, float, float, float)>();
for (int i = 0; i < 2535 * 4; i+=4)
{
    bboxes.Add((output[i], output[i+1], output[i+2], output[i+3]));
}

=> bboxes will contain all your bounding box predictions. Then you will certainly need to do some post-processing of the bounding boxes coordinates.

Hope this helps

Hi guys,

I'm trying to do the same.
Is there any progress from your side?

Thanks a lot

BobLd commented

Hi all,

I did some progress, the categories now seems to be correct but not the bounding boxes yet. The part you are missing is that Detect() needs to be implemented in C# (it is not present in the onnx model).

Please have a look here: https://github.com/BobLd/YOLOv4MLNet/tree/yolo-v5 for the latest progress.

Please find more info here:

post-processing is also done here:

Any help welcome on the missing bits!

Hi, I'm trying to use YOLOv5 with ML.Net and have a few questions about where your project is so far!

  1. Say I was to export my YOLOv5s.pt model to ONNX format with the Detect() layer included (following ultralytics/yolov5#343 (comment). In your code, have you implemented any of the Detect() layer so far? If so, where?
  2. Further down in ultralytics/yolov5#343 (comment), they mention the following:

In both cases, you do miss the following:

- filtering results with objectness lower than some threshold
- NMS
- conversion from xc, yc, w, h to x1, y1, x2, y2

I've had a look at your code and it looks like you've implemented both the filtering results with objectness and conversion steps but not NMS. Is this correct?

Thanks so much!

I've managed to get something working, but my bounding boxes are slightly off all the time. Would this be the NMS routine too maybe. I've taken the raw output by switching the export=False and simply using this output (I did something similar with Yolov4).

As it happens I'm not interested in the exact boxes for my application, just objects and their confidence rating so this works perfectly for me and super quick even on a CPU.

What I'm seeing is exactly what is mentioned here to be fair. So I think its definitely that I'm missing a final set of steps with the bounding boxes, but like I say I'm only interested in confidences and labels at the moment.

https://towardsdatascience.com/object-detection-part1-4dbe5147ad0a

BobLd commented

@deanbennettdeveloper
Thanks for your help, I will have a look at the code you provided here. Did you modify anything else in the code? Your code basically replaces my NMS code, right?

@kellansteele:

  • Would you be able to share a link to a YoloV5 model with Detect included?
  • Detect (not exactly following the same structure, a bit messy):
    for (int i = 0; i < results.Length; i++)
    {
    var pred = results[i];
    var outputSize = shapes[i];
    for (int boxY = 0; boxY < outputSize; boxY++)
    {
    for (int boxX = 0; boxX < outputSize; boxX++)
    {
    for (int a = 0; a < anchorsCount; a++)
    {
    var offset = (boxY * outputSize * (classesCount + 5) * anchorsCount) + (boxX * (classesCount + 5) * anchorsCount) + a * (classesCount + 5);
    var predBbox = pred.Skip(offset).Take(classesCount + 5).Select(x => Sigmoid(x)).ToArray(); // y = x[i].sigmoid()
    // more info at
    // https://github.com/ultralytics/yolov5/issues/343#issuecomment-658021043
    // https://github.com/ultralytics/yolov5/blob/a1c8406af3eac3e20d4dd5d327fd6cbd4fbb9752/models/yolo.py#L29-L36
    // postprocess_bbbox()
    var predXywh = predBbox.Take(4).ToArray();
    var predConf = predBbox[4];
    var predProb = predBbox.Skip(5).ToArray();
    var rawDx = predXywh[0];
    var rawDy = predXywh[1];
    var rawDw = predXywh[2];
    var rawDh = predXywh[3];
    float predX = ((rawDx * 2f) - 0.5f + boxX) * STRIDES[i];
    float predY = ((rawDy * 2f) - 0.5f + boxY) * STRIDES[i];
    float predW = (float)Math.Pow(rawDw * 2, 2) * ANCHORS[i][a][0];
    float predH = (float)Math.Pow(rawDh * 2, 2) * ANCHORS[i][a][1];
    // postprocess_boxes
    // (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)
    var box = Xywh2xyxy(new float[] { predX, predY, predW, predH });
    float predX1 = box[0]; //predX - predW * 0.5f;
    float predY1 = box[1]; //predY - predH * 0.5f;
    float predX2 = box[2]; //predX + predW * 0.5f;
    float predY2 = box[3]; //predY + predH * 0.5f;
    // (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)
    float org_h = ImageHeight;
    float org_w = ImageWidth;
    float inputSize = 640f;
    float resizeRatio = Math.Min(inputSize / org_w, inputSize / org_h);
    float dw = (inputSize - resizeRatio * org_w) / 2f;
    float dh = (inputSize - resizeRatio * org_h) / 2f;
    var orgX1 = 1f * (predX1 - dw) / resizeRatio; // left
    var orgX2 = 1f * (predX2 - dw) / resizeRatio; // right
    var orgY1 = 1f * (predY1 - dh) / resizeRatio; // top
    var orgY2 = 1f * (predY2 - dh) / resizeRatio; // bottom
  • NMS (Non max suppresion) - As you can see, the line 165 is commented to deactivate the NMS for testing purpose. Just uncomment the line to reactivate it:
    // Non-maximum Suppression
    postProcesssedResults = postProcesssedResults.OrderByDescending(x => x[4]).ToList(); // sort by confidence
    List<YoloV4Result> resultsNms = new List<YoloV4Result>();
    int f = 0;
    while (f < postProcesssedResults.Count)
    {
    var res = postProcesssedResults[f];
    if (res == null)
    {
    f++;
    continue;
    }
    var conf = res[4];
    string label = categories[(int)res[5]];
    resultsNms.Add(new YoloV4Result(res.Take(4).ToArray(), label, conf));
    postProcesssedResults[f] = null;
    var iou = postProcesssedResults.Select(bbox => bbox == null ? float.NaN : BoxIoU(res, bbox)).ToList();
    for (int i = 0; i < iou.Count; i++)
    {
    if (float.IsNaN(iou[i])) continue;
    if (iou[i] > iouThres)
    {
    //postProcesssedResults[i] = null; // deactivated for debugging
    }
    }
    f++;
    }

Hi BobLd, that's great thanks for having a look at this. My code just replaces the GetResults you have in the Prediction class you created. I didn't modify anything else other than number of classes as I've 15 rather than 80.

I used Roboflow.ai to create the yolov5 (small) model. Only takes about an hour to train. Only bit I modified on there is make sure it grabs the latest HEAD of the Yolov5 repo, (currently it reverts to an older version). I initially had issues with the older version as the model worked but failed to export to ONNX.

The inference time though on a small yolov5 model is fantastic.

Thanks for the NMS examples too! I'll try those in my code too!

I guess this bit of my code does a basic type of NMS? Although I guess it would not pick up multiple detections in differentiations parts of the image, that would be it's flaw currently.

List r = new List();

  foreach(var label in resultsNms.Select(p => p.Label).Distinct()) {
    r.Add(resultsNms.Where(p => p.Label == label).OrderByDescending(p => p.Confidence).First());
  }

@deanbennettdeveloper is a good starting point. Confidence and labels are calculated correctly, so it is just needed to tune NMS and IoU to get the bounding boxes working.

Hey! I think I properly fixed the bounding boxes issue in @deanbennettdeveloper implementation. I properly added NMS implementation of @BobLd. Could you take a look? Apart from this two functions, it is important to say that interleavePixelColors: false was the key. The rest of the code should just adjust dimensions to 640x640 images and yolov5 shape.

        public IReadOnlyList<YoloV5Result> GetResults(string[] categories, float scoreThres = 0.5f, float iouThres = 0.5f)
        {

            // Probabilities + Characteristics
            int characteristics = categories.Length + 5;

            // Needed info
            float modelWidth = 640.0F;
            float modelHeight = 640.0F;
            float xGain = modelWidth / ImageWidth;
            float yGain = modelHeight / ImageHeight;
            float[] results = Output;

            List<float[]> postProcessedResults = new List<float[]>();

            // For every cell of the image, format for NMS
            for (int i = 0; i < 25200; i++)
            {
                // Get offset in float array
                int offset = characteristics * i;
                
                // Get a prediction cell
                var predCell = results.Skip(offset).Take(characteristics).ToList();
                
                
                // Filter some boxes
                var objConf = predCell[4];
                if (objConf <= scoreThres) continue;
                

                // Get corners in original shape
                var x1 = (predCell[0] - predCell[2] / 2) / xGain; //top left x
                var y1 = (predCell[1] - predCell[3] / 2) / yGain; //top left y
                var x2 = (predCell[0] + predCell[2] / 2) / xGain; //bottom right x
                var y2 = (predCell[1] + predCell[3] / 2) / yGain; //bottom right y

                // Get real class scores
                var classProbs = predCell.Skip(5).Take(categories.Length).ToList();
                var scores = classProbs.Select(p => p * objConf).ToList();

                // Get best class and index
                float maxConf = scores.Max();
                float maxClass = scores.ToList().IndexOf(maxConf);

                postProcessedResults.Add(new[] { x1, y1, x2, y2, maxConf, maxClass });
                
                // Discard low confs predictions
                if (maxConf > scoreThres)
                {
                    // Format [ x1, y1, x2, y2, maxConf, maxClass ]
                    postProcessedResults.Add(new[] { x1, y1, x2, y2, maxConf, maxClass });
                }
                
            }

            var resultsNMS = ApplyNMS(postProcessedResults, categories, iouThres);
            
            return resultsNMS;
        }

        private List<YoloV5Result> ApplyNMS(List<float[]> postProcessedResults, string[] categories,  float iouThres=0.5f)
        {
            postProcessedResults = postProcessedResults.OrderByDescending(x => x[4]).ToList(); // sort by confidence
            List<YoloV5Result> resultsNms = new List<YoloV5Result>();

            int f = 0;
            while (f < postProcessedResults.Count)
            {
                var res = postProcessedResults[f];
                if (res == null)
                {
                    f++;
                    continue;
                }

                var conf = res[4];
                string label = categories[(int)res[5]];

                resultsNms.Add(new YoloV5Result(res.Take(4).ToArray(), label, conf));
                postProcessedResults[f] = null;

                var iou = postProcessedResults.Select(bbox => bbox == null ? float.NaN : BoxIoU(res, bbox)).ToList();
                for (int i = 0; i < iou.Count; i++)
                {
                    if (float.IsNaN(iou[i])) continue;
                    if (iou[i] > iouThres)
                    {
                        postProcessedResults[i] = null;
                    }
                }
                f++;
            }

            return resultsNms;
        }
BobLd commented

Hi @raulsf6, sounds great!! Would you be able to push your work in https://github.com/BobLd/YOLOv4MLNet/tree/yolo-v5-incl?
I won't have the time to look at it today, but I think it would be great to have your work in a branch.

Concerning my implementation, more work needs to be done on the bounding box (fully implementing Detect() in C#), so this is normal the BBoxes you get are not correct, even with NMS activated.

@BobLd sure! I have to say final fix was setting resizing: ResizingKind.Fill and it works perfectly. I'm gonna clean the code and push it.

Hi @raulsf6 That's fantastic news, thanks for looking into this and adding those missing pieces! I'll give this a go too on my code and hopefully it will also fix it for me.

In terms of the ResizingKind.Fill, I've taken a different approach and gone with the padding as that is also how I've trained my model with padding. So I guess this setting is important depending how the model was built this is a good article for that

https://blog.roboflow.com/you-might-be-resizing-your-images-incorrectly/

Thank you both again (@BobLd & @raulsf6) for the work on this. Looks like we've finally got a full Yolov5 working with ML.NET.

Hi @raulsf6 , you are definitely right with the ResizingKind.Fill setting. I've now changed to this with your code and the boxes are perfect. However I'm not getting as good detection with this method, I'll have more of a play with the model and different training image settings. I'm sure I'll hit a sweet spot eventually. Thanks for the code, it works great.

Hi @deanbennettdeveloper and @BobLd, I just made a pull request with the code. Thanks to your work I could learn a lot about computer vision and Yolo. Check the code out when you have some time and tell me if something could be improved!

Hi @raulsf6, I've worked out the issue with the ResizeKind.IsoPad and not working correctly. It dawned on me that the coords of the boxes from the results were based on a padded image which has either black bars at top and bottom or right and left depending on the aspect ratio. The same coords were then being applied to the original image which hadn't been padded. So the trick is to calculate the xoffset and yoffset that would have been used in the model image transformation and use these to adjust the xGain and yGain plus translating the centrex and centre y.

Here's my original code (before your NMS) but if you do decide to use the IsoPad and train model with padding then this is what you'll need. I am finding much better confidence results with the padded image model as there is no distortion of the image.

public IReadOnlyList GetResults(string file, float imageWidth, float imageHeight, string[] categories, float scoreThres = 0.5f, float iouThres = 0.5f) {

  //List<float[]> postProcesssedResults = new List<float[]>();

  int size = Output.Length; // 1x25200x85=2142000
  int dimensions = categories.Length + 5;
  int rows = (int)(size / dimensions); //25200
  int confidenceIndex = 4;
  int labelStartIndex = 5;
  float modelWidth = 640.0F;
  float modelHeight = 640.0F;
  
 float xoffset = 0;
  float yoffset = 0;

  float aspectratio = imageWidth / imageHeight;

  if (aspectratio > 1) { // most common for videos
    float actual_imageheight = modelWidth / aspectratio;
    yoffset = (modelHeight - actual_imageheight) / 2F;
  } else { // I guess common for mobile phones videos
    float actual_imagewidth = modelHeight * aspectratio;
    xoffset = (modelWidth - actual_imagewidth) / 2F;
  }

  float xGain = (modelWidth - (xoffset * 2)) / imageWidth;
  float yGain = (modelHeight - (yoffset * 2)) / imageHeight;

  List<YoloV5Result> resultsNms = new List<YoloV5Result>();

  for (int i = 0; i < rows; ++i) {
    
    int index = i * dimensions;
    if (Output[index + confidenceIndex] <= scoreThres) continue;

    for (int j = labelStartIndex; j < dimensions; ++j) {
      Output[index + j] = Output[index + j] * Output[index + confidenceIndex];
    }

    for (int k = labelStartIndex; k < dimensions; ++k) {
      
      if (Output[index + k] <= scoreThres) continue;

      string label = categories[k - labelStartIndex];
      float confidence = Output[index + k];

      Output[index] = Output[index] - xoffset;
      Output[index+1] = Output[index+1] - yoffset;

      var x1 = (Output[index] - Output[index + 2] / 2) / xGain; //top left x
      var y1 = (Output[index + 1] - Output[index + 3] / 2) / yGain; //top left y
      var x2 = (Output[index] + Output[index + 2] / 2) / xGain; //bottom right x
      var y2 = (Output[index + 1] + Output[index + 3] / 2) / yGain; //bottom right y

      resultsNms.Add(new YoloV5Result(x1, y1, x2, y2, label, confidence, ""));

    }
  }

  List<YoloV5Result> r = new List<YoloV5Result>();

  foreach(var label in resultsNms.Select(p => p.Label).Distinct()) {
    r.Add(resultsNms.Where(p => p.Label == label).OrderByDescending(p => p.Confidence).First());
  }

  return r;

}
BobLd commented

@deanbennettdeveloper, if you think it is useful, please make a commit to the same branch as @raulsf6. Could be useful to have everything in a single branche

I wanted to work in C# with the latest Yolov5 (release 4). The script provided by @BobLd #2 (comment) came close but did not work for me. I'm still no sure why this is but the way the offset was calculated resulted in wrong data for the output of my model.

I had to rewrite and optimized along the way here and there. As this topic/repo was very useful I'd wanted to give back a bit by sharing my script: https://gist.github.com/keesschollaart81/83de609f0852670656290fe0180da318. I'm not using ML.NET but 96% of the code can be reused if you like.

It's basically my C# rewrite of this

BobLd commented

@keesschollaart81, amazing! I'll add that to the ReadMe.