KPR: Keypoint Promptable Re-Identification

🔥 SOTA ReID model, with optional keypoint prompts, that is robust to multi-person occlusions 🔥

✅ Plug-and-play with any pose estimator or through manual keypoint selection

🚀 Release of the Occluded-PoseTrack-ReID dataset with manual keypoint annotations + keypoints annotations for four existing ReID benchmarks

KPR can be easily integrated into any codebase to perform person retrieval, person search, multi-person (pose) tracking, multi-view multi-skeleton matching for 3D pose, ...

SOTA performance with the SOLIDER human-centric foundation model as backbone.

Keypoint Promptable Re-Identification, ECCV24

Vladimir Somers, Alexandre Alahi, Christophe De Vleeschouwer

arxiv 2407.18112

State-of-the-art performance on 5 datasets:

Occluded-PoseTrack-ReID:

Occluded-Duke:

Occluded ReID:

Partial-ReID:

Market1501:

Table of content

KPR: Keypoint Promptable Re-Identification

Update

[2024.08.23] Full codebase release
[2024.07.10] Dataset and weights release

What's next

We plan on extending this codebase in the near future, put a star and stay updated for future changes:

release of the multi-person pose tracking codebase that is based on TrackLab.

🚀 Demo

We provide a simple demo script to demonstrate how to use KPR with the provided pre-trained weights to compute the similarity score between multiple person images and therefore perform person re-identification. The keypoint prompts are optional and can be used only when necessary, e.g. when there is a multi-person occlusion scenario. More information is provided in the demo file.

Introduction to Keypoint Promptable Re-Identification

Welcome to the official repository of our ECCV24 paper "Keypoint Promptable Re-Identification". In this work, we propose KPR, a keypoint promptable method for part-based person re-identification. KPR is a SWIN transformer-based model that takes an RGB image along with a set of semantic keypoints as input (i.e. the prompt). It then produces a set of part-based features, each representing a distinct body part of the ReID target, along with their respective visibility scores. A visibility score indicates whether a body part is visible or occluded in the input image, so that only visible parts are considered when comparing two persons. Our method can process both positive and negative keypoints, which respectively represent the target and non-target pedestrians. Furthermore, KPR is designed to be prompt-optional to offer more practical flexibility: this means the same model can be used without prompt on non-ambiguous images, or with prompt when dealing with occlusions, while consistently achieving state-of-the-art performance in both cases.

Multi-Person Ambiguity (MPA)

Our method is designed to be robust to any type of occlusion, including occlusions involving multiple persons that causes Multi-Person Ambiguity. Multi-Person Ambiguity (MPA) arises when multiple individuals are visible in the same bounding box, making it challenging to determine the intended ReID target among the candidates. To address this issue, KPR is fed with additional keypoint prompts indicating the intended ReID target.

New Occluded-PoseTrack-ReID dataset

To encourage further research on promptable ReID, we release our proposed Occluded-PoseTrack-ReID dataset, a multi-person occluded ReID dataset with explicit target identification through manual keypoints annotations. Furthermore, we propose new keypoint annotations for four popular re-identification datasets (Market-1501, Occluded-Duke, Occluded-ReID and Partia-ReID) to provide a common setup for researchers to compare their promptable ReID methods.

Part-based Re-Identification

Our work is built on top of BPBreID, a strong baseline for part-based person re-identification. As illustrated in the figure below, part-based ReID methods output multiple embeddings per input sample, i.e. one for each part, whereas standard global methods only output a single embedding. Compared to global methods, part-based ones come with some advantages:

They achieve explicit appearance feature alignment for better ReID accuracy.
They are robust to occlusions, since only mutually visible parts are used when comparing two samples.

Our model KPR uses pseudo human parsing labels at training time to learn an attention mechanism. This attention mechanism has K branches to pool the global spatial feature map into K body part-based embeddings. Based on the attention maps activations, visibility scores are computed for each part. At test time, no human parsing labels is required. The final similarity score between two person images is computed using the average distance of all mutually visible part-based embeddings. Please refer to our paper and to BPBreID for more information.

What to find in this repository

In this repository, we propose a framework and a strong baseline to support further research on keypoint promptable ReID methods. Our code is based on BPBreID and the popular Torchreid framework for person re-identification. In this codebase, we provide several adaptations to the original framework to support promptable part-based ReID methods. Changes compared to Torchreid:

The ImagePartBasedEngine to train/test part-based models with prompts, compute query-gallery distance matrix using multiple features per test sample with support for visibility scores.
The fully configurable GiLt loss to selectively apply id/triplet loss on holistics (global) and part-based features.
The BodyPartAttentionLoss to train the attention mechanism.
The KPR part-based promptable model to compute part-based features with support for keypoint prompts as input, body-part learnable attention, fixed attention heatmaps from an external model, PCB-like horizontal stripes, etc.
The Albumentation data augmentation library used for data augmentation, that jointly transforms the image, keypoints prompts and human parsing labels.
Support for Weights & Biases and other logging tools in the Logger class.
An EngineState class to keep track of training epoch, etc.
A new ranking visualization tool to display part heatmaps, prompts, local distance for each part and other metrics (example image here).
For more information about all available configuration and parameters, please have a look at the default config file.

You can also have a look at the original Torchreid README for additional information, such as documentation, how-to instructions, etc. Be aware that some of the original Torchreid functionnality and models might be broken (for example, we don't support video re-id yet).

Installation instructions

Installation

Make sure conda is installed.

# clone this repository
git clone https://github.com/VlSomers/keypoint_promptable_reidentification

# create conda environment
cd kpr/ # enter project folder
conda create --name kpr python=3.10 pytorch==1.13.0 torchvision==0.14.0 pytorch-cuda=11.7 -c pytorch -c nvidia -y
conda activate kpr

# install dependencies
# make sure `which python` and `which pip` point to the correct path
pip3 install -r requirements.txt

# (optional) install openpifpaf if you want to generate your own human parsing annotations
# this is not required for most installations since human parsing labels are provided for download
pip3 install -e "git+https://github.com/PbTrack/openpifpaf@pbtrack#egg=openpifpaf[test]"

# install torchreid (don't need to re-build it if you modify the source code)
python3 setup.py develop

Download the Occluded-PoseTrack Re-Identification dataset

Download PoseTrack21.
Download Occluded-PoseTrack ReID annotations.

Note

Unfortunately we don't have the right to host and share a subset of PoseTrack21. This is why PoseTrack21 should be first downloaded from the original repository and then turned into a ReID dataset with our script.

Our proposed dataset is derived from the PoseTrack21 dataset for multi-person pose tracking in videos. The original PoseTrack21 dataset should be first downloaded following these instructions. We also provide json files that specify how the pose tracking annotations should be turned into our ReID dataset. These json files describes which detections (bounding boxes + keypoints) should be used as train/query/gallery samples. We provide these files and the related human parsing labels on GDrive. These files are read by our codebase to extract the ReID dataset from the pose tracking one and save the corresponding image crops on disk before launching the ReID experiment. They can also be integrated in any external codebase in a similar manner. The transformation from a tracking to a ReID dataset is performed within the OccludedPosetrack21 class. The human parsing labels were generated using SAM and PifPaf, more details are provided in the paper.

Generate the dataset

Our codebase will generate the Occluded-PoseTrack Re-Identification dataset the first time you run the test or train code. The codebase will look for the dataset by default under "~/datasets/reid/PoseTrack21". Either place your downloaded dataset there, or choose a new location under the cfg.data.root config, e.g. cfg.data.root = "/path/to/PoseTrack21" inside default_config.py, L177 (just specify the path to the folder containing the 'PoseTrack21' folder). Finally, put the downloaded Occluded-PoseTrack ReID annotations in the PoseTrack21 folder, under PoseTrack21/occluded_posetrack_reid. The ReID image crops will be extracted in that folder. Keypoints annotations are not saved in the occluded_posetrack_reid folder, but loaded at runtime from the PoseTrack21 json files inside posetrack_data. Feel free to open a GitHub issue if you encounter any issue during the dataset generation or need further information.

Here is an overview of the final folder structure

PoseTrack21
├── images                              # contains all images  
│   ├── train
│   ├── val
├── posetrack_data                      # contains annotations for pose reid tracking
│   ├── train
│   │   ├── 000001_bonn_train.json
│   │   ├── ...
│   ├── val
│       ├── ...
├── posetrack_mot                       # contains annotations for multi-object tracking 
│   ├── mot
│   │   ├── train
│   │   │   ├── 000001_bonn_train
│   │   │   │   ├── image_info.json
│   │   │   │   ├── gt
│   │   │   │       ├── gt.txt          # ground truth annotations in mot format
│   │   │   │       ├── gt_kpts.txt     # ground truth poses for each frame
│   │   │   ├── ...
│   │   ├── val
├── posetrack_person_search             # person search annotations
│   ├── query.json
│   ├── train.json
│   ├── val.json
├── occluded_posetrack_reid             # new occluded-posetrack-reid annotations
│   ├── images                          # image crops generated when running KPR for the first time
│   ├── masks                           # human parsing labels
│   ├── train_dataset_sampling.json     # which detection samples to use for training
│   ├── val_dataset_sampling.json       # which detection samples to use for evaluation, with a query/gallery split

Download annotations for existing datasets

You can download the keypoint and human parsing labels for Market-1501, Occluded-Duke, Occluded-ReID and Partia-ReID on GDrive. The human parsing labels (.npy) were introduced by BPBreID. The keypoint annotations (.json) were generated with the PifPaf pose estimation model. When multiple skeletons are detected within a single bounding box, the one with its head closer to the top center part of the image is considered as the ReID target, and marked with an 'is_target' attribute. Around 10% of the query samples in the Occluded-Duke dataset were annotated manually because either the target person was not correctly labeled or the target person was not detected. For Partiel-ReID, the first skeletons in the list is considered as the target. For more detalsl, please have a look at the paper. After downloading, unzip the file and put the masks folder under the corresponding dataset directory. For instance, Market-1501 should look like this:

Market-1501-v15.09.15
├── bounding_box_test
├── bounding_box_train
├── external_annotation
│   └── pifpaf_keypoints_pifpaf_maskrcnn_filtering
│       ├── bounding_box_test
│       ├── bounding_box_train
│       └── query
├── masks
│   └── pifpaf_maskrcnn_filtering
│       ├── bounding_box_test
│       ├── bounding_box_train
│       └── query
└── query

The external_annotation folder contains the keypoint annotations (with on json file per dataset sample) that are used as prompts at both training and test/inference time. The masks folder contains the human parsing labels that are used at training time only to supervise the part-based attention mechanism. Make also sure to set data.root config to your dataset root directory path, i.e., all your datasets folders (Market-1501-v15.09.15, Occluded_Duke, Occluded_REID, Partial_REID) should be under this path.

Download the pre-trained models

We also provide some state-of-the-art pre-trained models based on the Swin backbone. You can put the downloaded weights under a 'pretrained_models/' directory or specify the path to the pre-trained weights using the model.load_weights parameter in the yaml config. The configuration used to obtain the pre-trained weights is also saved within the .pth file: make sure to set model.load_config to True so that the parameters under the model.kpr part of the configuration tree will be loaded from this file.

Running KPR

Inference

You can test the above downloaded models on five popular ReID benchmarks using the following command:

conda activate kpr
python main.py --config-file configs/kpr/<pretraining>/kpr_<target_dataset>_test.yaml

For instance, for the Occluded-PoseTrack ReID dataset with the SOLIDER-based KPR model:

conda activate kpr
python main.py --config-file configs/kpr/solider/kpr_occ_posetrack_test.yaml

Configuration files for other datasets and pretraining weights are available under configs/kpr/. Make sure the model.load_weights in these yaml config files points to the pre-trained weights you just downloaded with above instructions.

Training

The training configs for five datasets (Occluded-PoseTrack-ReID, Occluded-Duke, Market-1501, Occluded-ReID and Partial-ReID) are provided in the configs/kpr/ folder. A training procedure can be launched with:

conda activate kpr
python ./main.py --config-file configs/kpr/<pretraining>/kpr_<target_dataset>_train.yaml

For instance, for the Occluded-Duke dataset with the SOLIDER pretrained weights:

conda activate kpr
python main.py --config-file configs/kpr/solider/kpr_occ_duke_train.yaml

Make sure to download and install the human parsing labels for your training dataset before running this command.

Visualization tools

The ranking visualization tool is activated by default, with the test.visrank config set to True in the default_config.py file. As illustrated below, this tool displays the Top-K ranked samples as rows (K can be set via test.visrank_topk). The first row with blue background is the query, and the following green/red rows indicated correct/incorrect matches. The attention maps for each test embedding (foreground, parts, etc) are displayed in the row. The colored heatmaps depicts the attention maps on each each body part, with one color per part. An attention map has a green/red border when it is visible/unvisible. The first number under each attention map indicate the visibility score and the second number indicate the distance of the embedding to the corresponding query embedding. The distances under the images in the first column on the left are the global distances of that sample to the query, which is usually computed as the average of all other distances weighted by the visibility score. If you need more information about the visualization tool, fell free to open an issue.

Further work

There are plenty of ideas to improve KPR, feel free to explore them:

Improve KPR cross-domain performance.
Explore the use of KPR to do retrieval based on a single selected body part (e.g. click of the boots of a person and retrieve all persons with similar boots).
Make KPR work with various type of input prompt: bboxes, segmentation masks, language, etc.
Study how KPR would perform with few non semantic keypoints (minimal amount of click on any random part of the target body).
Build a more difficult dataset, with images from various domains and complex occlusions involving both humans and objects.
Design a more efficient prompting mechanism that would not require to transform keypoints into heatmaps (SAM-like)
...

Other works

Please have a look at our other works on re-identification and sport video analysis:

Notes

I will try to discuss here various implementation choices, future research direction, open issues, etc. Feel free to open a Github issue if you have any question or suggestion.

SOLIDER was hard to fine-tune, the “semantic_weight" parameter did not have much impact. It was necessary to freeze the keypoint tokenizer for the first 20 epochs (with the train.fixbase_epoch config) and use a small learning rate.
The codebase has undergone big refactoring before public release and the provided training configuration might be incomplete, please let me know if you cannot reproduce the reported results.

Questions and suggestions

If you have any question/suggestion, or find any bug/issue with the code, please raise a GitHub issue in this repository, I'll be glab to help you as much as I can! I'll try to update the documentation regularly based on your questions.

Citation

If you use this repository for your research or wish to refer to our method KPR, please use the following BibTeX entry:

@misc{somers2024keypointpromptablereidentification,
      title={Keypoint Promptable Re-Identification}, 
      author={Vladimir Somers and Christophe De Vleeschouwer and Alexandre Alahi},
      year={2024},
      eprint={2407.18112},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.18112}, 
}

Feel free to also refer to BPBreID, our non-promptable baseline introduced at WACV 2023 as prior work:

@article{bpbreid,
    archivePrefix = {arXiv},
    arxivId = {2211.03679},
    author = {Somers, Vladimir and {De Vleeschouwer}, Christophe and Alahi, Alexandre},
    doi = {10.48550/arxiv.2211.03679},
    eprint = {2211.03679},
    isbn = {2211.03679v1},
    journal = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV23)},
    month = {nov},
    title = {{Body Part-Based Representation Learning for Occluded Person Re-Identification}},
    url = {https://arxiv.org/abs/2211.03679v1 http://arxiv.org/abs/2211.03679},
    year = {2023}
}

Acknowledgement

This codebase is a fork from BPBreID and Torchreid. We borrowed some code from SOLIDER and TransReID, thanks for their great contribution!