This repository contains codes for the official implementation in PyTorch of P2PNet as described in Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework.
A brief introduction of P2PNet can be found at 机器之心 (almosthuman).
The codes is tested with PyTorch 1.5.0. It may not run with other versions.
The overall architecture of the P2PNet. Built upon the VGG16, it firstly introduce an upsampling path to obtain fine-grained feature map. Then it exploits two branches to simultaneously predict a set of point proposals and their confidence scores.
The P2PNet achieved state-of-the-art performance on several challenging datasets with various densities.
Methods | Venue | SHTechPartA MAE/MSE |
SHTechPartB MAE/MSE |
UCF_CC_50 MAE/MSE |
UCF_QNRF MAE/MSE |
---|---|---|---|---|---|
CAN | CVPR'19 | 62.3/100.0 | 7.8/12.2 | 212.2/243.7 | 107.0/183.0 |
Bayesian+ | ICCV'19 | 62.8/101.8 | 7.7/12.7 | 229.3/308.2 | 88.7/154.8 |
S-DCNet | ICCV'19 | 58.3/95.0 | 6.7/10.7 | 204.2/301.3 | 104.4/176.1 |
SANet+SPANet | ICCV'19 | 59.4/92.5 | 6.5/9.9 | 232.6/311.7 | -/- |
DUBNet | AAAI'20 | 64.6/106.8 | 7.7/12.5 | 243.8/329.3 | 105.6/180.5 |
SDANet | AAAI'20 | 63.6/101.8 | 7.8/10.2 | 227.6/316.4 | -/- |
ADSCNet | CVPR'20 | 55.4/97.7 | 6.4/11.3 | 198.4/267.3 | 71.3/132.5 |
ASNet | CVPR'20 | 57.78/90.13 | -/- | 174.84/251.63 | 91.59/159.71 |
AMRNet | ECCV'20 | 61.59/98.36 | 7.02/11.00 | 184.0/265.8 | 86.6/152.2 |
AMSNet | ECCV'20 | 56.7/93.4 | 6.7/10.2 | 208.4/297.3 | 101.8/163.2 |
DM-Count | NeurIPS'20 | 59.7/95.7 | 7.4/11.8 | 211.0/291.5 | 85.6/148.3 |
Ours | - | 52.74/85.06 | 6.25/9.9 | 172.72/256.18 | 85.32/154.5 |
Comparison on the NWPU-Crowd dataset.
Methods | MAE[O] | MSE[O] | MAE[L] | MAE[S] |
---|---|---|---|---|
MCNN | 232.5 | 714.6 | 220.9 | 1171.9 |
SANet | 190.6 | 491.4 | 153.8 | 716.3 |
CSRNet | 121.3 | 387.8 | 112.0 | 522.7 |
PCC-Net | 112.3 | 457.0 | 111.0 | 777.6 |
CANNet | 110.0 | 495.3 | 102.3 | 718.3 |
Bayesian+ | 105.4 | 454.2 | 115.8 | 750.5 |
S-DCNet | 90.2 | 370.5 | 82.9 | 567.8 |
DM-Count | 88.4 | 388.6 | 88.0 | 498.0 |
Ours | 77.44 | 362 | 83.28 | 553.92 |
The overall performance for both counting and localization.
nAP$_{\delta}$ | SHTechPartA | SHTechPartB | UCF_CC_50 | UCF_QNRF | NWPU_Crowd |
---|---|---|---|---|---|
10.9% | 23.8% | 5.0% | 5.9% | 12.9% | |
70.3% | 84.2% | 54.5% | 55.4% | 71.3% | |
90.1% | 94.1% | 88.1% | 83.2% | 89.1% | |
64.4% | 76.3% | 54.3% | 53.1% | 65.0% |
Comparison for the localization performance in terms of F1-Measure on NWPU.
Method | F1-Measure | Precision | Recall |
---|---|---|---|
FasterRCNN | 0.068 | 0.958 | 0.035 |
TinyFaces | 0.567 | 0.529 | 0.611 |
RAZ | 0.599 | 0.666 | 0.543 |
Crowd-SDNet | 0.637 | 0.651 | 0.624 |
PDRNet | 0.653 | 0.675 | 0.633 |
TopoCount | 0.692 | 0.683 | 0.701 |
D2CNet | 0.700 | 0.741 | 0.662 |
Ours | 0.712 | 0.729 | 0.695 |
- Clone this repo into a directory named P2PNET_ROOT
- Organize your datasets as required
- Install Python dependencies. We use python 3.6.5 and pytorch 1.5.0
pip install -r requirements.txt
We use a list file to collect all the images and their ground truth annotations in a counting dataset. When your dataset is organized as recommended in the following, the format of this list file is defined as:
train/scene01/img01.jpg train/scene01/img01.txt
train/scene01/img02.jpg train/scene01/img02.txt
...
train/scene02/img01.jpg train/scene02/img01.txt
DATA_ROOT/
|->train/
| |->scene01/
| |->scene02/
| |->...
|->test/
| |->scene01/
| |->scene02/
| |->...
|->train.list
|->test.list
DATA_ROOT is your path containing the counting datasets.
For the annotations of each image, we use a single txt file which contains one annotation per line. Note that indexing for pixel values starts at 0. The expected format of each line is:
x1 y1
x2 y2
...
The network can be trained using the train.py
script. For training on SHTechPartA, use
CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT \
--dataset_file SHHA \
--epochs 3500 \
--lr_drop 3500 \
--output_dir ./logs \
--checkpoints_dir ./weights \
--tensorboard_dir ./logs \
--lr 0.0001 \
--lr_backbone 0.00001 \
--batch_size 8 \
--eval_freq 1 \
--gpu_id 0
By default, a periodic evaluation will be conducted on the validation set.
A trained model (with an MAE of 51.96) on SHTechPartA is available at "./weights", run the following commands to launch a visualization demo:
CUDA_VISIBLE_DEVICES=0 python run_test.py --weight_path ./weights/SHTechA.pth --output_dir ./logs/
CUDA_VISIBLE_DEVICES=0 python video_demo.py --weight_path ./weights/SHTechA.pth
- Part of codes are borrowed from the C^3 Framework.
- We refer to DETR to implement our matching strategy.
If you find P2PNet is useful in your project, please consider citing us:
@inproceedings{song2021rethinking,
title={Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework},
author={Song, Qingyu and Wang, Changan and Jiang, Zhengkai and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Wu, Yang},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2021}
}
- [AAAI2021] To Choose or to Fuse? Scale Selection for Crowd Counting. (paper link & codes)
- [ICCV2021] Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting. (paper link & codes)