/CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Primary LanguagePythonMIT LicenseMIT

CLTR (Crowd Localization TRansformer)

[Project page] [paper]

An official implementation of "An end to end transformer model for crowd localization" (Accepted by ECCV 2022).

  • Currently, the code of this version is not well organized, which may contain some obscure code comments.

Environment

python ==3.6
pytorch ==1.80
opencv-python
scipy
h5py
pillow
imageio
nni
mmcv
tensorboard

Datasets

  • Download JHU-CROWD ++ dataset from here
  • Download NWPU-Crowd dataset (resized) from Baidu, password: 04i4 or Onedrive

Prepare data

Generate point map

cd CLTR/data
For JHU-Crowd++ dataset: python prepare_jhu.py --data_path /xxx/xxx/jhu_crowd_v2.0
For NWPU-Crowd dataset: python prepare_nwpu.py --data_path /xxx/xxx/NWPU_CLTR

Generate image list

cd CLTR
python make_npydata.py --jhu_path /xxx/xxx/jhu_crowd_v2.0 --nwpu_path /xxx/xxx/NWPU_CLTR

Training

Example (some hyper-parameters may be different from the original paper):
cd CLTR
sh experiments/jhu.sh
or
sh experiments/nwpu.sh

  • Please change nproc_per_node and gpu_id of jhu.sh/nwpu.sh, if you do not have enogh GPU.
  • We have fixed all random seeds, i.e., different runs will report the same results under the same setting.
  • The model will be saved in CLTR/save_file/log_file
  • Note that using FPN will improve the performance, but we do not add it in this version.
  • Turning some hyper-parameters will also bring improvement (e.g., the image size, crop size, number of queries).

Here we give the comparison.

NWPU-Crowd (val set) MAE MSE
Original paper 61.9 246.3
This repo (training log) 51.3 116.7

Testing

Example:
python test.py --dataset jhu --pre model.pth --gpu_id 2,3
or
python test.py --dataset nwpu --pre model.pth --gpu_id 0,1

  • The model.pth can be obtained from the training phase.

Video Demo

Example:
python video_demo.py --video_path ./video_demo/demo.mp4 --num_queries 700 --pre video_model.pth

  • The "video_model.pth" (trained from NWPU-Crowd training set) can be downloaded from Baidu disk, password: rw6b or google drive.
  • The generated video will be named "out_video.avi"

avatar

Visiting bilibili or Youtube to watch the video demo.

Acknowledgement

Thanks for the following great work:

@inproceedings{carion2020end,
  title={End-to-end object detection with transformers},
  author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
  booktitle={European conference on computer vision},
  pages={213--229},
  year={2020},
  organization={Springer}
}
@inproceedings{meng2021conditional,
  title={Conditional detr for fast training convergence},
  author={Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3651--3660},
  year={2021}
}

Reference

If you find this project is useful, please cite:

@article{liang2022end,
  title={An end-to-end transformer model for crowd localization},
  author={Liang, Dingkang and Xu, Wei and Bai, Xiang},
  journal={European Conference on Computer Vision},
  year={2022}
}