facebookresearch/co-tracker

How better use CoTracker in our projects(traffic tracking)

learnuser1 opened this issue · 10 comments

Hello, when I’m using a specific number of points to track in a video for autonomous driving inference, I’ve noticed that once the target object is occluded, the tracked points start following other objects, resulting in poor tracking performance. Are there any good methods to address this issue of poor performance after occlusion? Which approach, using points or a grid, yields better results in tracking when facing occlusion?

fenaux commented

Hi,
I face a similar problem with a video of one person running seen from the side.
If left leg is in the front, points on the right leg stay still after they have been occluded by left leg. All points stay still if they are occluded by a vertical stick for a short time

Hi @learnuser1, @fenaux,
Could you send me an example of such a video with estimated tracks? How many points are you tracking and what grid_size / local_grid_size do you use?
Also, have you tried other CoTracker configurations, such as stride_4_wind_12?

In my experience, tracking is always better with a non-zero grid_size because the model can identify similarly moving points and pay attention to background points to compensate for camera motion. A grid_size of 4/5/6 works best for a small number of points (less than 50). When tracking a handful of points, local_grid_size can also improve tracking performance.

fenaux commented

Thanks, I will try local_grid_size anyway here are two sample. Queries are the keypoints as obtained by a top down human pose

queries_run_pred_track.mp4
query_run_backward_pred_track.mp4

Thanks @fenaux, could you also send the original video with queried points? I'll try to make it work better

fenaux commented

@nikitakaraevv Many thanks for the interest you deserve to my questions
Here is the video (I do not know if it uploaded correctly !)

1_crop.mp4

queries from frame number 2
tensor([[ 2., 127., 124.],
[ 2., 127., 124.],
[ 2., 96., 141.],
[ 2., 156., 163.],
[ 2., 86., 180.],
[ 2., 195., 167.],
[ 2., 66., 186.],
[ 2., 219., 179.],
[ 2., 87., 185.],
[ 2., 199., 164.]])

queries from last frame
tensor([[229., 99., 112.],
[229., 99., 112.],
[229., 57., 122.],
[229., 120., 154.],
[229., 51., 172.],
[229., 151., 156.],
[229., 35., 186.],
[229., 163., 158.],
[229., 54., 164.],
[229., 158., 174.]])

fenaux commented

@nikitakaraevv here is a case where your method works nicely 🥇
Each runner is initialized with a local grid [10,5]
Note that one corridor 9 runner is lost just at finish lane. Conventional trackers loose runners in corridor 5 and 6 when they pass the olympic rings

Tokyo_final_tracks_filt.mp4

Hi @fenaux, thank you for sending me these videos! I've tried different things but unfortunately wasn't able to significantly improve the result with simple tricks. I'll keep working on CoTracker, so stay tuned!
If you find other examples where the model doesn't work well, please let me know!
I'm closing the issue for now.

@fenaux I'm working on a similar problem, to track athletes' body joints and use the tracks for motion kinematics analysis. Can I hit you up and discuss more about it?

fenaux commented

@zhuolisam yes provide me a mail for instance

@fenaux here this is my email zhuolisam0627@gmail.com