luxonis/depthai

[BUG] YOLOv4 Postprocessing Issues

Opened this issue · 0 comments

Describe the bug
Yolov4 outputs noisy and innaccurate bounding boxes when going through the dai.YoloDetectionNetwork node

Minimal Reproducible Example
The base app in https://docs.luxonis.com/software/depthai/examples/tiny_yolo/, using yolo4
specifically yolo-v4-tiny-tf

Expected behavior
Accurate bounding boxes around the detected object

Screenshots
image

Another user also experienced similar issues on the forum a while back.
https://discuss.luxonis.com/d/744-crazy-yolov4-tiny-detections-from-depthai-python-examples

While they found a "fix" for the problem, the system still suffers from a inaccurate bounding box.
image

Additional context
TLDR:
The openvino v4 model, requires an additional sigmoid call on the x,y center output

After looking through lots of documentation about the models I stumbled across the openvino specification for the models
https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tiny-tf
and
https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tiny-tf

The only difference I could find is that the confidence and x,y centers require sigmoid to get accurate values.
This is also the underlying reason that you can give confidence values higher than one to the v4 model, and it will still give outputs
CONFIDENCE_THRESHOLD = 0.9
actual_threshold = inverse_sigmoid(CONFIDENCE_THRESHOLD) if YOLO_VERSION == "4" else CONFIDENCE_THRESHOLD

This lead me to recreating the yolo postprocessing pipeline to add the sigmoid function in the correct place
https://github.com/Tianxiaomo/pytorch-YOLOv4/tree/master after recreating this postprocessing system in np rather than torch

I now got a functional yolo v4 model.
bxy = sigmoid(bxy) * scale_x_y - 0.5 * (scale_x_y - 1)
Commenting and un-commenting this line toggles between the noisy outputs shown above and the expected outputs with smooth precise bounding boxes.

I tried to find where you do the postprocessing for these models in your github repos, but I couldn't find anything leading me to believe the logic for this postprocessing sits in some kind of proprietary codebase, otherwise I would have created a PR.

I am relatively new to open source/ issues
So please let me know if I can give you more info?
I can upload my modified yolo demo code if that would be of a help as well