Tracking People by Predicting 3D Appearance, Location & Pose (CVPR 2022 Oral)

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose".
Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik.

This code repository provides a code implementation for our paper PHALP, with installation, preparing datasets, and evaluating on datasets, and a demo code to run on any youtube videos.

Abstract : In this paper, we present an approach for tracking people in monocular videos, by predicting their future 3D representations. To achieve this, we first lift people to 3D from a single frame in a robust way. This lifting includes information about the 3D pose of the person, his or her location in the 3D space, and the 3D appearance. As we track a person, we collect 3D observations over time in a tracklet representation. Given the 3D nature of our observations, we build temporal models for each one of the previous attributes. We use these models to predict the future state of the tracklet, including 3D location, 3D appearance, and 3D pose. For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner. Association is solved with simple Hungarian matching, and the matches are used to update the respective tracklets. We evaluate our approach on various benchmarks and report state-of-the-art results.

Installation

We recommend creating a clean conda environment and install all dependencies. You can do this as follows:

conda env create -f scripts/_env.yaml

After the installation is complete you can activate the conda environment by running:

conda activate PHALP

Demo

Please run the following command to run our method on a youtube video. This will download the youtube video from a given ID, and extract frames, run Detectron2, run HMAR and finally run our tracker and renders the video.

./scripts/_PHALP.sh

Also, you can render with different renders (NMR or PyRender) with different visualization by changing render_type option. Additionally, you can also replace HUMAN with GHOST to see the continuous tracks, even if it is not detected or occluded.

Testing

Once the posetrack dataset is downloaded at "_DATA/posetrack_2018/", run the following command to run our tracker on all videos on the supported datasets. This will run MaskRCNN, HMAR to create embeddings and run PHALP on these prepossessed data.

python test_datasets.py --track_dataset posetrack-val

Evaluation

To evaluate the tracking performance on ID switches, MOTA, and IDF1 and HOTA metrics, please run the following command.

python3 evaluate_PHALP.py out/Videos_results/results/ PHALP posetrack

Results (Project site)

We evaluated our method on PoseTrack, MuPoTs and AVA datasets. Our results show significant improvements over the state-of-the-art methods on person tracking. For more results please visit our website.

Acknowledgements

Parts of the code are taken or adapted from the following repos:

Contact

Jathushan Rajasegaran - jathushan@berkeley.edu or brjathu@gmail.com
To ask questions or report issues, please open an issue on the issues tracker.
Discussions, suggestions and questions are welcome!

Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

@inproceedings{rajasegaran2022tracking,
  title={Tracking People by Predicting 3{D} Appearance, Location \& Pose},
  author={Rajasegaran, Jathushan and Pavlakos, Georgios and Kanazawa, Angjoo and Malik, Jitendra},
  booktitle={CVPR},
  year={2022}
}

koshian2/PHALP