ifzhang/ByteTrack

Use my own detection model

Tetsujinfr opened this issue · 21 comments

Hi
Thanks for sharing this project.
If I want to use my own detection model with your tracker, what/where is the main entry point for me to adapt your code in order to replace the yolox detection model?
Do I need retrain everything or can I inject the detections into your pretrained model for inference?
Thanks

You can add your detection model here:
https://github.com/ifzhang/ByteTrack/blob/main/yolox/evaluators/mot_evaluator.py
And pass your detection results to BYTETracker to get the tracking results.

So I have been trying to use my own detections , and it works. However my results are less stable on occlusion scenarios than using your default detector in this repo.
I am testing on the video below, and with your detector + bytetrack I got 3 ids tracked (perfect, using match_thresh=0.85 and the x model).
When using my detections (yolov5, only 1 class and not many low conf score/false positives) though, I end up with 6 (the last 3 occlusions generate new tracks).

I have played with track_buffer, match_thresh, with my detection conf threshold (I went as low as 0.001% to have more than 2 or 3 detections per frame) but without success.

Which parameter do you recommend I should test further? E.g. I did not observe any impact when increasing track_buffer, but not sure why.

thanks

video1.mp4

.

Ok so there is a scale post processing factor in the tracker.update() method, which downscale detections by a factor of 1.48 in my test case, and this scale factor is related to the exp.test_size of the yolox detector of the repo I guess. But if one uses detections from another detector, then this scale factor needs to be turned off I think.
Also, there is a hard coded floor confidence level at 0.1 in the update() methode to discard detections with very low conf levels, but I did expect this threshold not to be hard coded.
Nonetheless, even after tweaking those details, I can not get as accurate results as the Yolox detector so I guess it is down to my detector which is not as good as the default one of the repo.

One question for @ifzhang : is there a parameter to discard tracks with very short durations, e.g. tracks with lower than 2 frames of existence or should I implement one if I need this?

thanks

Hi @ifzhang
Is there a way to get the correspondence between the detection input id and the tracked id returned from ByteTracker?
I provide the tracker with a lot of bboxes given the low conf scores used as inputs, however the tracker returns much fewer bboxes (as expected), but how do I know which detection bbox have been retained and which have been discarded by the tracker, for a given frame?

Basically how do I reconcile between the "output_results" and the "online_targets returned by the tracker? Is there a property in the online_targets object which contain the original detection indices for example?

thanks a lot

Hi @Tetsujinfr
How can you adapt your code in order to replace the yolox detection model? Help me to put yolov5 instead of yolox please!

You need to feed a n x 5 array as the first argument to the ByteTracker.update(). The first 4 columns of the matrix represent the bbox coordinates of your yolov5 bbox detections (so x1, y1, x2, y2) and the fifth column is the confidence score of the bbox.
Pay attention though, there are a couple of hardcoded thresholds in the class code, namely 0.1 as a low score confidence filter and 0.5 as a tracking threshold (if I remember well). You may need to tweak those based on the distribution of the confidence scores of the detector you use.
It does work but for some reason I have much less stable results than the ones from yolox, the tracker does not acquire some detections that I feed to it, and I have no idea why, those detection having high confifencecscores. I need to debug this further. Good luck.

Thanks, I load yolo5s from torch hub and put its detection to ByteTrack update function, seems it works.

Hi @Tetsujinfr , any process?
I use yolov5 instead of yolox, the detections work well, but the track is pretty bad, I still don't know why.

I am still debuging to understand why some tracks are much worse when using my y5 detections rather than the yolox ones. I suspect it may come from the kalman filter initial parameters: my detections have an aspect ratio close to 1.0 (that is normal and ok on my side) while the yolox ones from the repo are > 1.6 (human bodies), and I see that the detection ratio plays a specific role in the kalman filter formulation. Still searching on this ...
@ifzhang if I am tracking round or square like detection shapes, shall I modify some of the kalman filters params you use? Shall I change even the filter formulation? Thanks

I am still debuging to understand why some tracks are much worse when using my y5 detections rather than the yolox ones. I suspect it may come from the kalman filter initial parameters: my detections have an aspect ratio close to 1.0 (that is normal and ok on my side) while the yolox ones from the repo are > 1.6 (human bodies), and I see that the detection ratio plays a specific role in the kalman filter formulation. Still searching on this ... @ifzhang if I am tracking round or square like detection shapes, shall I modify some of the kalman filters params you use? Shall I change even the filter formulation? Thanks

@Tetsujinfr To solve your problem you can modify line number 270 and 271 of https://github.com/ifzhang/ByteTrack/blob/1926ce65c1c3b1320e229cc91b60e5867fb0244b/tools/demo_track.py
as follows:

from:

                vertical = tlwh[2] / tlwh[3] > args.aspect_ratio_thresh
                if tlwh[2] * tlwh[3] > args.min_box_area and not vertical:

To

                    #vertical = tlwh[2] / tlwh[3] > args.aspect_ratio_thresh
                    if tlwh[2] * tlwh[3] > args.min_box_area: #and not vertical:

Yes I saw the vertical bool flag and I did de-activate it already but that did not change much if anything at the time. I think there is something more profound which is optimised for the yolox detections shapes and dynamics.
Thanks anyway for your comment

@Tetsujinfr I run on MOT17, so the aspect ratio is not the problem. Do you have a solution now?

From what I have seen while debuging through the code, I think that the problem is actually trivial : my detections bbox areas are quite small, and in the video I am testing the objects move fast so that there is no IOU between two frames for the same bbox. Consequently the tracker loose tracking of the object I think, even with th help of the kalman filter predictions.
When using the Yolox tracker the bbox area are larger (I am not tracking persons as opposed to the default Yolox tracker of the repo), so there is more likelihood that bboxes overlap over two consecutive frames. I have tried to increase the default KalmanF velocity value but that did not help.
So now I am working on resizing up my objects detection areas and on adding camera motion compensation to the Kalman filter to improve the next frame predictions of the objects. Standard next steps I think.

Hi @Tetsujinfr,
I am facing the same issue as I am running yolox to detect moving vehicles. The detection is pretty well but the tracker misses many tracks. Do you have any update about that issue.

Thanks

It really depends if your vehicles are moving quite fast w.r.t their size on the image. If they are moving fast and can move by more than the bbox size within one frame, then if you artificially expand the bbox sizes input to the tracker, it should definitely help (but you may have more id duplicates as well). i did it and it worked quite well for me. You could remove the condition in the code which prevents predictions to be accepted if they are too far away from the current position, but I am not sure if this is a safe move, I did not analyse the code enough.

I think the proper way to manage all this is to tweak the Kalman filter parameters, or the Kalman filter model itself. I am looking into it but I have to do some homework first to do that properly in the code.

Also, if your camera is moving, compensating for those moves is definitely a must. You can use optical flow with Lucas-Kanade to give you an approximation of xy shifts, frame by frame (you will add at least one frame of latency though, and some extra compute load). You would have to mask out the dynamic parts of the image for a robust result, so masking out your moving vehicle in your use case.

Thank you for you all of these information I will work on implementing them. However, what do you mean by expanding the bbox size? Do you mean to scale them up before passing them to the tracker?

" However, what do you mean by expanding the bbox size? Do you mean to scale them up before passing them to the tracker?"
Yes exactly. If there is a privileged dimension for your use case, e.g. horizontal movements, then you can only expand the width, instead of expanding both width and height. For my use case, expanding by a factor of 2 was enough, but this is a quick and dirty trick, better to work on upgrading the kalman filter I think. Or using other matching algos if you do not need online processing.

@Tetsujinfr , 我觉得没这么复杂,推荐这个项目https://github.com/yhsmiley/bytetrack_realtime,另外在使用这个跟踪器之前,先用nms选择阈值0.1,iou阈值选择一个合适的(这个在byteTrack的issue里面作者也提到了)。效果确实是有的,而且还可以,但是想要达到作者的那种效果,感觉还是跟检测器关系比较大

@wuyuanmm thanks for the link. Very clean repo on the isolated tracker.
I am clear on the limitations of the ByteTrack tracker now, for my use cases it was primarily related to those few things I think:

  1. the tracker drops IDs if the detection bboxes do not overlap from one frame to the next, so for a 'fast' moving object, the initial detection is lost by the tracker. I have tested that.
  2. the Kalman filter is a linear velocity model, so for (quite a bit) moving camera videos it does have some perf issues. But that is expected, so no big surprise.
  3. the original repo is optimized for portrait ratio objects in the code (by filtering out objects which have a vertical ratio<1.6), so square objects may have slightly less accurate tracking than bodies for instance.
  4. the low boundary confidence score is hard coded to 0.1 in the code, I suspect this threshold fits well the Yolox proba scores range, but it may have to be tuned differently for other detectors.

I am closing this issue since I have been able to input my own detections into the tracker so all good.
Thanks for sharing your great work @ifzhang

@Tetsujinfr colud you show me how you replaced yolox to yolov5 in python code

For fast moving object/slow fps/moving camera, maybe using ReID feature is a better choice?