A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal

  • Single image crowd counting is a challenging computer vision problem with wide applications in public safety, city planning, traffic management, etc.
  • This survey is to provide a comprehensive summary of recent advanced crowd counting techniques based on Convolutional Neural Network (CNN) via density map estimation.
  • Our goals are to provide an up-to-date review of recent approaches, and educate new researchers in this field the design principles and trade-offs.

Our long survey paper (23 pages) is accepted to Neurocomputing 2022 paper


Introduction

Crowd counting techniques applied in different domains

  • Privacy preserving crowd monitoring: Counting people without people models or tracking [paper]
  • Learning to count objects in images [paper]
  • Towards perspective-free object counting with deep learning [paper]

Detection-based

  • Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection [paper]
  • Shape-based human detection and segmentation via hierarchical part-template matching [paper]
  • Counting crowded moving objects [paper]

Regression-based

  • Bayesian poisson regression for crowd counting [paper]
  • Counting people with low-level features and Bayesian regression [paper]
  • Deep people counting in extremely dense crowds [paper]

Density map estimation

More recently, crowd counting via density map estimation has emerged as a promising approach with encouraging results. Such approaches achieve high accuracy for crowded scenes and preserve spatial information of people distribution.

We summarize by comparing the aforementioned three major crowd counting approaches in the following table.

summary

we review the recent advances with detailed comparisons on three major design modules for crowd counting: deep neural network designs, loss functions, and supervisory signals.

Datasets and Performance Evaluation

Datasets

Metrics considered to choose datasets

  • Image resolution
  • Number of images in the dataset
  • Object count

Publicly available datasets

We extract and present some typical images from the public datasets in the following figure.

dataset

  • NWPU-Crowd: Nwpu-crowd: A large-scale benchmark for crowd counting [paper]

  • UCF_QNRF: Composition loss for counting, density map estimation and localization in dense crowds [paper]

  • GCC: Pixel-Wise Crowd Understanding via Synthetic Data [paper]

  • Fudan-ShanghaiTech: Locality-constrained spatial transformer network for video crowd counting [paper]

  • ShanghaiTech A & B: Single-image crowd counting via multi-column convolutional neural network [paper]

  • WorldExpo'10: Cross-scene crowd counting via deep convolutional neural networks [paper]

  • UCF_CC_50: Multi-source multi-scale counting in extremely dense crowd images [paper]

  • Mall: Feature mining for localised crowd counting [paper]

  • UCSD: Privacy preserving crowd monitoring: Counting people without people models or tracking [paper]

Performance Evaluation and Metrics

  • Accuracy: counting accuracy and location accuracy
  • Quality of density map: resolution and visual quality
  • Complexity: computation complexity and annotation complexity
  • Flexibility and robustness

Deep Neural Network Design

Fully Convolutional Network

  • Fully convolutional crowd counting on highly congested scenes [paper]

Encoder-Decoder Architecture

  • Scale aggregation network for accurate and efficient crowd counting [paper]
  • Crowd counting and density estimation by trellis encoder-decoder networks [paper]

Multi-Column and Pyramid Network

Multi-Column architecture

  • Single-image crowd counting via multi-column convolutional neural network [paper]

  • Improving the learning of multi-column convolutional neural network for crowd counting [paper]

Pyramid architecture

  • Crowd counting by adaptively fusing predictions from an image pyramid [paper]

  • Generating high-quality crowd density maps using contextual pyramid cnns [paper]

Dilated and Deformable Convolutional Operations

Dilated convolution

  • An aggregated multicolumn dilated convolution network for perspective-free counting [paper]

  • Denet: A universal network for counting crowd with varying densities and scales [paper]

Deformable convolution

  • Dadnet: Dilated-attention-deformable convnet for crowd counting [paper]

  • Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding [paper]

Attention-based Model

  • SCAR: Spatial-/channel-wise attention regression networks for crowd counting [paper]

  • Relational attention network for crowd counting [paper]

  • Attend to count: Crowd counting with adaptive capacity multi-scale CNNs [paper]

Others

Graph neural network-based method

  • Hybrid Graph Neural Networks for Crowd Counting [paper]

Recurrent neural network-based method

  • Crowd counting using deep recurrent spatial-aware network [paper]

  • End-to-end crowd counting via joint learning local and global count [paper]

Combining with detection

  • Where are the blobs: Counting by localization with point supervision [paper]

  • Decidenet: Counting varying density crowds through attention guided detection and density estimation [paper]

Loss Function

Euclidean Loss

  • Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes [paper]

SSIM Loss

  • Crowd counting with deep structured scale integration network [paper]

  • Cross-Level Parallel Network for Crowd Counting [paper]

Adversarial Loss

  • Adversarial learning for multiscale crowd counting under complex scenes [paper]

  • Atrous convolutions spatial pyramid network for crowd counting and density estimation [paper]

Multi-task Learning

  • Crowd counting via scale-adaptive convolutional neural network [paper]

  • Counting with focus for free [paper]

Others

  • Learning to count with cnn boosting [paper]

  • Nonlinear regression via deep negative correlation learning [paper]

  • From open set to closed set: Counting objects by spatial divide-and-conquer [paper]

Supervisory Signal

Fully Supervised Learning

  • Adaptive density map generation for crowd counting [paper]

  • Bayesian loss for crowd count estimation with point supervision [paper]

Weakly Supervised and Semi-supervised Learning

  • Ha-ccn: Hierarchical attention-based crowd counting network [paper]

  • Generalizing semi-supervised generative adversarial networks to regression using feature contrasting [paper]

Unsupervised and Self-supervised Learning

  • Almost unsupervised learning for dense crowd counting [paper]

  • Leveraging unlabeled data for crowd counting by learning to rank [paper]

Automatic Labeling through Synthetic Data

  • Learning from synthetic data for crowd counting in the wild [paper]

  • Focus on semantic consistency for cross-domain crowd understanding [paper]

Conclusion and Future Directions

  • Automatic and lightweight network designing
  • Weakly supervised and unsupervised crowd counting
  • Crowd counting in videos
  • Multi-view fusion for crowd counting

References

If you find this work or code useful, please cite:

@article{bai2022survey,
  title={A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal},
  author={Bai, Haoyue and Mao, Jiageng and Chan, S-H Gary},
  journal={Neurocomputing},
  year={2022},
  publisher={Elsevier}
}