YuliangXiu/PoseFlow

Tracking and speed issues

Closed this issue · 1 comments

I have a video from a sports game and I want to track players.
I used AlphaPose to generate pose data.
I have two issues with PoseFlow:

  1. Speed.
    On my Windows PC it does not use GPU and a 23 sec video (1024x576, 24 fps) takes more than 10 hours to process (with Intel i5 4x4.2GHz CPU).
    Is this normal? Can it run on GPU? Would it run better on Linux with GPU?

For comparison, I wrote a simple script which does the same task with a simple algorithm, - it finds the closest person from previous frames, and if the distance is below threshold, it considers it the same person. It takes less than 1 second for the mentioned above video. And, actually, resulting data is not much worse than PoseFlow.

  1. There're many issues with tracking. The basic one - often a person runs across the court, no jumps or anything, and suddenly it's another person. I run PoseFlow with default parameters, are there any parameters that can help?
    A worse issue is that often PoseFlow reassigns IDs ("idx") to different people. Part of a game a player # 2 is in the white shirt, in the left corner, another part it considers # 2 another player, in black shirt, in the right corner. It completely messes up my research .. Is there anything can be done?

Another question about usage, not that important, in the doc it is said that to use PoseFlow I need to download PoseTrack dataset, is this really required? It is 86Gb and I did not download it, PoseFlow works without it.

Thank you!

37

thanks for using PoseFlow!

I tested poseflow on PoseTrack dataset (86G, this is a benchmark dataset released for research, so there is no need to download this big dataset if you want to run this algorithm in your own dataset) under my ubuntu system with Intel Core i9-7900XCPU@3.3GHzx20, I think the speed will increase a lot at crowded scenes, cuz my tracking scheme will try to find matching person from several adjacents frames(previous/next).

Yes, finding the closest box from adjacent frames is an intuitive solution, but if you want to get better tracking accuracy, no matter "better feature extraction" or "better matching schemes", they all need more time and memory, this is a accuracy-time/memory trade off.

If you want better tracking results, maybe you can first use sota MOT(multi-object tracking) algorithms to track the predicted human bounding boxes and then use some single person pose estimation(SPPE) framework to get final keypoints, I have implemented a SPPE, you can use it for fast single person pose estimation.

Good luck!