/POAR

Primary LanguagePythonMIT LicenseMIT

POAR

Our paper POAR: Towards Open Vocabulary Pedestrian Attribute Recognition

train-TEMSπŸ“Ž

A PyTorch Lightning solution to train pedestrian open attribute recognition models

Dependencies

pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'
pip install pytorch-lightning

Usage πŸš‚

Finetuning πŸš†

python train_finetune.py --folder data_dir --batch_size 512

Testing πŸš†

CUDA_VISIBLE_DEVICES=0 python test_UDA_T2I.py --testset PA100K --trainset PETA --folder  ./test/dataset/PETA/

Training with our DataModule πŸ“‰

As long as each of the image pairs have the same stem name (i.e. img1.png and img1.txt) all that you need to do is specify the folder on runtime. Any subfolder structure will be ignored, meaning foo/bar/image1.jpg will always find its myster/folder/image1.txt so long as they share a common parent folder. All image suffixes will work, the only expectation is that captions are separated by \n.

Goal ⚽

Our aim is to create an easy to use Lightning implementation of pedestrian open attribute recognition. We will live by:

TEMS Framework Image

Citation

@inproceedings{Yue2023poar,
title={POAR: Towards Open Vocabulary Pedestrian Attribute Recognition},
author={Yue Zhang, Suchen Wang, Shichao Kan, Zhenyu Weng, Yigang Cen, and Yappeng Tan},
booktitle={In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23), October 29–November 3, Ottawa, ON, Canada.},
year={2023},
doi={10.1145/3581783.3611719}
}

TODO βœ…

  • Get OpenAI's model creation script
  • Create model inits
    • ResNet50
    • ResNet50x4
    • ResNet101
    • ViT-B/32
    • all models
  • Create model wrapper
  • Create lightning trainer
  • Create dataset files
  • Performance boosts
    • Mixed-precision
    • Self-distillation
    • Gradient checkpointing
    • Half-precision Adam statistics
    • Half-precision stochastically rounded text encoder weights