Handle Small/Tiny and Fast/High Speed moving object detection/tracking with stable inference

Question

Handle Small/Tiny and Fast/High Speed moving object detection/tracking with stable inference

Closed this issue 5 months ago · 21 comments

Search before asking

I have searched the Yolo Tracking issues and found no similar bug report.

Question

Hello @mikel-brostrom, I am involve a project that predict ball in hole classification, I am having a discussion on the ultralytics issue question below:
ultralytics/ultralytics#7109
this issue also contain my result clip demo, training result and relevant information

Then I found your repo with plugged difference SORT tracking algorithms, I wondered from your experience on tracking small/high speed object like golf ball,do you have any ideas or strategy on keep tracking them well? I am tend to re-trained the ReID model.

If you have any better idea with my case, please help me, I am noob with computer vision so really need your consult.

Thanks you so much

Answer 1 · 2024-01-11T14:21:29.000Z

ReID won't solve your tracking issues. High speed objects have high motion uncertainty, specially if the FPS for your camera are very low. For those cases IoU is not a viable association metric as there most certainly won't be any overlap at all between the bboxes. Because of this, a centroid based metric would be more suitable.

Answer 2 · 2024-01-11T20:23:45.000Z

@tgbaoo I have built a bunch of code around a centroid based system and got it to work quite well. its really quite a different paradigm than iou and there are a few gotchas. lmk if you want some help.

Answer 3 · 2024-01-11T22:27:45.000Z

#1247:

def centroid_batch(bboxes1, bboxes2, w, h):
    """
    Computes the normalized centroid distance between two sets of bounding boxes.
    Bounding boxes are in the format [x1, y1, x2, y2].
    w, h (width, height) is used to normalize the distance.
    """

    # Calculate centroids
    centroids1 = np.stack(((bboxes1[..., 0] + bboxes1[..., 2]) / 2,
                           (bboxes1[..., 1] + bboxes1[..., 3]) / 2), axis=-1)
    centroids2 = np.stack(((bboxes2[..., 0] + bboxes2[..., 2]) / 2,
                           (bboxes2[..., 1] + bboxes2[..., 3]) / 2), axis=-1)

    # Expand dimensions for broadcasting
    centroids1 = np.expand_dims(centroids1, 1)
    centroids2 = np.expand_dims(centroids2, 0)

    # Calculate Euclidean distances
    distances = np.sqrt(np.sum((centroids1 - centroids2) ** 2, axis=-1))

    # Normalize distances
    norm_factor = np.sqrt(w**2 + h**2)
    normalized_distances = distances / norm_factor

    return normalized_distances

The output is normalized with respect to the diagonal of the image in pixels. This is, the maximum possible distance between the centroids in the input image.

Answer 4 · 2024-01-12T08:06:50.000Z

In order to use the centroid cost generated above, interchangeably with iou_cost, the cost matrix should inverted:

return 1 - normalized_distances

The rationale on the thresholding would then be that objects over a certain distance are ignored. Instead of the threshold based on overlap.

Answer 5 · 2024-01-12T09:52:53.000Z

A ReID model won't help you because all the golf balls basically look the same @tgbaoo. But maybe you are detecting more objects?

Answer 6 · 2024-01-12T11:01:44.000Z

@tgbaoo

You can now try:

from boxmot import OCSORT

tracker = OCSORT(
    asso_func="centroid",
    iou_threshold=0.3  # use this to set the centroid threshold that match your use-case best
)

or

from boxmot import DeepOCSORT

tracker = DeepOCSORT(
    asso_func="centroid",
    iou_threshold=0.3  # use this to set the centroid threshold that match your use-case best
)

iou_threshold has to be changed to only threshold now 😄. 0.7 iou_threshold means that only objects within 30% of the images's diagonal in pixels will be accepted as possible matches.

Answer 7 · 2024-01-12T19:41:15.000Z

At the moment this only works for OCSORT and DeepOCSORT in 10.0.50. Let me know if you get better results now @tgbaoo. Please, drop some example videos if possible so that we can analyze the results on your custom use-case. We take it from there 😄

Answer 8 · 2024-01-13T10:20:48.000Z

@mikel-brostrom, @colonelpanic8, Thanks for you guys passion and quick apply new logic for centroid based cost, I am currently just detect only the golf ball only for this stage, for future stage is also detect the golfer for design system for my app demonstrate the which player are taking the shot and the result belong to which user

I am applying the new logic so the clip result will be updated soon.

Thanks you so much for your consult and support once again.

Answer 9 · 2024-01-14T09:41:43.000Z

@mikel-brostrom I already have a quick test, the result magically robust, I writing some code to imwrite the video, I will post some result soon, stay tune!

Thanks again from @colonelpanic8 and @mikel-brostrom support <3

Answer 10 · 2024-01-14T10:03:25.000Z

@colonelpanic8, @mikel-brostrom This is my some result test on OCSORT:

duy_swing_result.2.mp4

test_putt_cam_03_trimed.mp4

test_putt_cam_01_trimed.mp4

I am gonna testing DeepOCSORT, and update to the same comment, but I noticed that this is good with fast moving (like golf putting) but not seem's good with smaller trend + fast moving (like golf swing clip), do you guys have any solution/idea for handle insanely fast moving and smaller trend object like the golf swing?

P/S: I have another question is, if I apply this on Deep-based (DeepSORT, DeepOCSORT, HybridSORT, StrongSORT) so I have to use ReID model right? Because I noticed you said that ReID seems not work on my project because ball seem's the same every case

Answer 11 · 2024-01-14T14:57:53.000Z

Exactly what I was expecting 🔥. The question related to dynamic speed object (fast to slow/ slow to fast) tracking is a matter of adaptive KFs. In order to get this working you will need to carefully study the system and its dynamics. So make total sense that this works for more static dynamics like putting but not for swinging.

Answer 12 · 2024-01-14T18:18:52.000Z

@mikel-brostrom Really appreciate your knowledge and expertise again.

For next stage of the project is try to draw the golf swing path line, with the sample video like I attached above, from your experience, do you have any recommendation, consulting, solution or just few keywords for me to do research on my own to solve the drawing exactly or nearly golf path line of the golf swing.

I have do some research, they give me some keywords related about motion tracking rather than actually visual object tracking.

One of them is this repo, they do some C++ code (which is a little bit hardcore for me) but if you have time, your attention on this repo and give me some opinion or recommendation is very useful to me:
https://github.com/Nuzhny007/Multitarget-tracker

Thanks for your work ⛳🙌🙌🙌🔥🔥🔥

Answer 13 · 2024-01-14T18:40:29.000Z

For next stage of the project is try to draw the golf swing path line, with the sample video like I attached above, from your experience, do you have any recommendation

Have you tried SAHI for the swing video with the metric I just implemented? Given that the golf ball is considerably smaller than in the rest of the videos, this approach could be a valid alternative. Let's start small and not over-complicate things 😄

Answer 14 · 2024-01-14T19:29:21.000Z

@mikel-brostrom, I already applied SAHI. Although the result was good, but the delay of the inference is not compatible with my realtime stream camera application, I considered design my own algorithm like:

A few first frame I use yolo like normal

When I detect the ball success, then I create a window for 'detect area', then I detect for only the 'detect area' then when the ball is hit, if the ball is moving from the center of the detect area, we also move the 'detect area' Follow to the ball, then when the ball too small and lose the detection, I hard code the 'drop down' effect like the PGA Tour sample video below:
https://github.com/mikel-brostrom/yolo_tracking/assets/86455738/9a600ef2-08f9-4b97-a09f-10cb175ea137

From the technical opinion, how would you think about my logic? Is that possible to apply? I will run some code using chatgpt support for the algorithm.

Answer 15 · 2024-01-14T19:38:13.000Z

If it works well using SAHI. This:

When I detect the ball success, then I create a window for 'detect area', then I detect for only the 'detect area' then when the ball is hit, if the ball is moving from the center of the detect area, we also move the 'detect area' Follow to the ball,

I guess should work too.

I hard code the 'drop down' effect

This I believe will be very difficult to achieve with a realistic outcome. But sure, try it out 😄

Answer 16 · 2024-01-25T00:13:20.000Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Answer 17 · 2024-01-26T10:52:08.000Z

@tgbaoo

You can now try:
from boxmot import OCSORT

tracker = OCSORT(
    asso_func="centroid",
    iou_threshold=0.3  # use this to set the centroid threshold that match your use-case best
)
iou_threshold has to be changed to only threshold now 😄. 0.7 iou_threshold means that only objects within 30% of the images's diagonal in pixels will be accepted as possible matches.

@mikel-brostrom HI, may I ask which file should this code be added to?

Answer 18 · 2024-01-26T11:03:16.000Z

Just substitute the tracker in any of the examples: https://github.com/mikel-brostrom/yolo_tracking#custom-object-detection-model-tracking-example by any of the supported trackers with this association function: #1246 (comment) @050603

Answer 19 · 2024-01-26T12:44:57.000Z

@mikel-brostrom Thanks a lot

Answer 20 · 2024-01-26T12:59:52.000Z

@mikel-brostrom Hello Mikel, now I am moving to deploy my project to production, can I open another question label to discuss about how we process the tracker and detector keep process real time (with high frame rate and low latency - delay) as much as possible from streaming frames from Ip camera?

Answer 21 · 2024-01-26T14:19:58.000Z

Sure @tgbaoo !