Contents

Deep attention aware feature learning is designed for person ReID task. Our method adds two branches from the backbone network at training stage so as to guide the backbone being able to learn global and local attention aware features. The PAB forces the separated groups of feature channels focus on predefined body parts by predicting their corresponding keypoints. The HAB branch predicts the mask of a person and restricts the backbone network to focus on person bodies instead of background. PAB and HAB do not influence the testing stage, thus the same inference time and model size are maintained compared with the backbone network.

Paper: Deep attention aware feature learning for person re-identification

We implement our method on the basis of widely used TriNet. TriNet consists of backbone network, i.e. ResNet-50, and 2 fully connected layers in the end. The parameters of decoders in HAB and PAB are listed following.

layer #channels in #channels out kernel size stride
deconv 1 2048 (HAB) 341 (PAB) 64 3x3 2
deconv 2 64 64 3x3 2
deconv 3 64 64 3x3 2
deconv 4 64 64 3x3 2
1x1 conv 64 1 (HAB) keypoint groups (PAB) 1x1 1

Market1501 dataset is used to train and test model. Market-1501 has 32668 annotated bounding boxes of 1501 identities collected by six cameras. There are 12936 images used for training. The query and gallery sets have 3368 and 19732 images respectively.

Keypoint and mask annotations are generated by CPN (Cascaded Pyramid Network for Multi-Person Pose Estimation) and FCN (Fully Convolutional Networks for Semantic Segmentation). For more information, please refer to their project: CPN and FCN. And we also upload keypoint and mask annotations to Baiduyun. The password is qp8k.

Data structure:

Datasets
├──Market-1501  
|  ├── bounding_box_test [19732 jpgs]
|  ├── bounding_box_train [12936 jpgs]
|  ├── gt_bbox [25259 jpgs]
|  ├── gt_query [6736 mats]
|  ├── query [3368 jpgs]
|  └── readme.txt
├──mask-anno
|  ├── bounding_box_test [19732 jpgs]
|  ├── bounding_box_train [12936 jpgs]
|  └── query [3368 jpgs]
└──Market_cpn_keypoints
   └── bounding_box_train_256_2 [12936*17=219912 jpgs]

For convenience, we provide a docker image which includes all environmental requirements to run experiments by MindSpore. And we also upload the image to Baiduyun. The password is qp8k.

# load image
docker load -i name.tar

# create container
docker run -it -d --cap-add sys_ptrace --name=DAAF_mindspore --runtime=nvidia --ipc=host -p 6022:22 -v /home/cyf:/home/cyf 1683c3860cc5 /bin/bash

Model uses pre-trained backbone ResNet50 trained on ImageNet2012. Link

# run training example
bash scripts/run_standalone_train_gpu.sh 0 /path/to/market1501/ /path/to/output/ /path/to/pretrined_resnet50.pth

# run distributed training example
bash scripts/run_distribute_train_gpu.sh 8 /path/to/market1501/ /path/to/output/ /path/to/pretrined_resnet50.pth

# run evaluation example
bash scripts/run_eval_gpu.sh /your/path/checkpoint_file

If you find this code useful in your research, please kindly consider citing our paper:

@article{chen2022deep,
title={Deep attention aware feature learning for person re-identification},
author={Chen, Yifan and Wang, Han and Sun, Xiaolu and Fan, Bin and Tang, Chu and Zeng, Hui},
journal={Pattern Recognition},
volume={126},
pages={108567},
year={2022},
publisher={Elsevier}
}

If you have any questions, please contact us. You could open an issue on github or email us.

Yifan Chen
EMAIL: chenyifan0627@gmail.com