/Modality-Confusion-Learning

Modality Confusion Learning for Cross-Modal Person ReID (ICCV 2021)

Primary LanguagePythonApache License 2.0Apache-2.0

Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation (ICCV 2021)

Highlight

Modality Confusion Learning Network (MCLNet) is designed to learn modality-invariant features by simultaneously minimizing inter-modality discrepancy while maximizing cross-modality similarity among instances in a single framework. MCLNet aims to end-to-end learn the discrimination representations from two subspaces, modality properties and identity properties of the comprehensive feature space.

⚙️ Setup environment

  • Clone this repo:

    git clone https://github.com/bitreidgroup/Modality-Confusion-Learning.git

    cd Modality-Confusion-Learning

  • Create a conda environment and activate the environment.

  • conda env create -f environment.yaml

    conda activate mcl

We recommend Python = 3.6, CUDA = 10.0, Cudnn = 7.6.5, Pytorch = 1.2, and CudaToolkit = 10.0.130 for the environment.

🔧 Preparing dataset

  • RegDB Dataset : The RegDB dataset can be downloaded from this website by submitting a copyright form.

    (Named: "Dongguk Body-based Person Recognition Database (DBPerson-Recog-DB1)" on their website).

We do not preprocess the RegDB dataset.

  • SYSU-MM01 Dataset : The SYSU-MM01 dataset can be downloaded from this website.

  • We preprocess the SYSU-MM01 dataset to speed up the training process.

    • if you do not need the identities of the cameras, run the preprocess scripts

      python pre_process_sysu.py

    After running, the training data will be stored in ".npy" format.

    • if your need the identities of the cameras, run :

      python pre_process_sysu_cam.py

    The identities of cameras will be also stored in ".npy" format.

⏳ Training

You may need manually define the data path first.

python train.py --dataset sysu --lr 0.1 --method mcl --gpu 1

- --dataset: which dataset "sysu" or "regdb".

- --lr: initial learning rate.

- --optim: optimizer choice, SGD or Adam.

- --method: method to run or baseline.

- --arch: network baseline: ResNet18 or ResNet50

- --resume: resume from checkpoint

- --model_path: the path to save checkpoints

- --gpu: which gpu to run.

💽 Testing

python test.py --mode all --resume 'model_path' --gpu 1 --dataset sysu

- --mode: "all" or "indoor" all search or indoor search (only for SYSU-MM01 dataset).

- --trial: testing trial (only for RegDB dataset, no need for SYSU-MM01).

- --resume: the saved model path.

💾GPUs

All our experiments were performed on a single NVIDIA GeForce 2080 Ti

Training Datasets Approximate GPU memory Approximate training time
SYSU-MM01 9GB 24 hours
RegDB 6GB 5 hours

Citation

If this repository helps your research, please cite :

@inproceedings{hao2021cross,
  title={Cross-modality person re-identification via modality confusion and center aggregation},
  author={Hao, Xin and Zhao, Sanyuan and Ye, Mang and Shen, Jianbing},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16403--16412},
  year={2021}
}

📄 References.

  1. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. (2021). Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6), 2872-2893.
  2. Ye, M., Shen, J., J Crandall, D., Shao, L., & Luo, J. (2020, August). Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In European Conference on Computer Vision (pp. 229-247). Springer, Cham.

✉️ Contact.

If you have some questions, feel free to contact me. yiyuanzhang.ai@gmail.com