A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal

Single image crowd counting is a challenging computer vision problem with wide applications in public safety, city planning, traffic management, etc.
This survey is to provide a comprehensive summary of recent advanced crowd counting techniques based on Convolutional Neural Network (CNN) via density map estimation.
Our goals are to provide an up-to-date review of recent approaches, and educate new researchers in this field the design principles and trade-offs.

Our long survey paper (23 pages) is accepted to Neurocomputing 2022 paper

Introduction

Crowd counting techniques applied in different domains

Privacy preserving crowd monitoring: Counting people without people models or tracking [paper]
Learning to count objects in images [paper]
Towards perspective-free object counting with deep learning [paper]

Detection-based

Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection [paper]
Shape-based human detection and segmentation via hierarchical part-template matching [paper]
Counting crowded moving objects [paper]

Regression-based

Bayesian poisson regression for crowd counting [paper]
Counting people with low-level features and Bayesian regression [paper]
Deep people counting in extremely dense crowds [paper]

Density map estimation

More recently, crowd counting via density map estimation has emerged as a promising approach with encouraging results. Such approaches achieve high accuracy for crowded scenes and preserve spatial information of people distribution.

We summarize by comparing the aforementioned three major crowd counting approaches in the following table.

we review the recent advances with detailed comparisons on three major design modules for crowd counting: deep neural network designs, loss functions, and supervisory signals.

Datasets and Performance Evaluation

Datasets

Metrics considered to choose datasets

Image resolution
Number of images in the dataset
Object count

Publicly available datasets

We extract and present some typical images from the public datasets in the following figure.

NWPU-Crowd: Nwpu-crowd: A large-scale benchmark for crowd counting [paper]
UCF_QNRF: Composition loss for counting, density map estimation and localization in dense crowds [paper]
GCC: Pixel-Wise Crowd Understanding via Synthetic Data [paper]
Fudan-ShanghaiTech: Locality-constrained spatial transformer network for video crowd counting [paper]
ShanghaiTech A & B: Single-image crowd counting via multi-column convolutional neural network [paper]
WorldExpo'10: Cross-scene crowd counting via deep convolutional neural networks [paper]
UCF_CC_50: Multi-source multi-scale counting in extremely dense crowd images [paper]
Mall: Feature mining for localised crowd counting [paper]
UCSD: Privacy preserving crowd monitoring: Counting people without people models or tracking [paper]

Performance Evaluation and Metrics

Accuracy: counting accuracy and location accuracy
Quality of density map: resolution and visual quality
Complexity: computation complexity and annotation complexity
Flexibility and robustness

Deep Neural Network Design

Fully Convolutional Network

Fully convolutional crowd counting on highly congested scenes [paper]

Encoder-Decoder Architecture

Scale aggregation network for accurate and efficient crowd counting [paper]
Crowd counting and density estimation by trellis encoder-decoder networks [paper]

Multi-Column and Pyramid Network

Multi-Column architecture

Single-image crowd counting via multi-column convolutional neural network [paper]
Improving the learning of multi-column convolutional neural network for crowd counting [paper]

Pyramid architecture

Crowd counting by adaptively fusing predictions from an image pyramid [paper]
Generating high-quality crowd density maps using contextual pyramid cnns [paper]

Dilated and Deformable Convolutional Operations

Dilated convolution

An aggregated multicolumn dilated convolution network for perspective-free counting [paper]
Denet: A universal network for counting crowd with varying densities and scales [paper]

Deformable convolution

Dadnet: Dilated-attention-deformable convnet for crowd counting [paper]
Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding [paper]

Attention-based Model

SCAR: Spatial-/channel-wise attention regression networks for crowd counting [paper]
Relational attention network for crowd counting [paper]
Attend to count: Crowd counting with adaptive capacity multi-scale CNNs [paper]

Others

Graph neural network-based method

Hybrid Graph Neural Networks for Crowd Counting [paper]

Recurrent neural network-based method

Crowd counting using deep recurrent spatial-aware network [paper]
End-to-end crowd counting via joint learning local and global count [paper]

Combining with detection

Where are the blobs: Counting by localization with point supervision [paper]
Decidenet: Counting varying density crowds through attention guided detection and density estimation [paper]

Loss Function

Euclidean Loss

Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes [paper]

SSIM Loss

Crowd counting with deep structured scale integration network [paper]
Cross-Level Parallel Network for Crowd Counting [paper]

Adversarial Loss

Adversarial learning for multiscale crowd counting under complex scenes [paper]
Atrous convolutions spatial pyramid network for crowd counting and density estimation [paper]

Multi-task Learning

Crowd counting via scale-adaptive convolutional neural network [paper]
Counting with focus for free [paper]

Others

Learning to count with cnn boosting [paper]
Nonlinear regression via deep negative correlation learning [paper]
From open set to closed set: Counting objects by spatial divide-and-conquer [paper]

Supervisory Signal

Fully Supervised Learning

Adaptive density map generation for crowd counting [paper]
Bayesian loss for crowd count estimation with point supervision [paper]

Weakly Supervised and Semi-supervised Learning

Ha-ccn: Hierarchical attention-based crowd counting network [paper]
Generalizing semi-supervised generative adversarial networks to regression using feature contrasting [paper]

Unsupervised and Self-supervised Learning

Almost unsupervised learning for dense crowd counting [paper]
Leveraging unlabeled data for crowd counting by learning to rank [paper]

Automatic Labeling through Synthetic Data

Learning from synthetic data for crowd counting in the wild [paper]
Focus on semantic consistency for cross-domain crowd understanding [paper]

Conclusion and Future Directions

Automatic and lightweight network designing
Weakly supervised and unsupervised crowd counting
Crowd counting in videos
Multi-view fusion for crowd counting

References

If you find this work or code useful, please cite:

@article{bai2022survey,
  title={A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal},
  author={Bai, Haoyue and Mao, Jiageng and Chan, S-H Gary},
  journal={Neurocomputing},
  year={2022},
  publisher={Elsevier}
}

HaoyueBaiZJU/A-Recent-Systematic-Survey-for-Crowd-Counting

A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal

Introduction

Crowd counting techniques applied in different domains

Detection-based

Regression-based

Density map estimation

Datasets and Performance Evaluation

Datasets

Metrics considered to choose datasets

Publicly available datasets

Performance Evaluation and Metrics

Deep Neural Network Design

Fully Convolutional Network

Encoder-Decoder Architecture

Multi-Column and Pyramid Network

Multi-Column architecture

Pyramid architecture

Dilated and Deformable Convolutional Operations

Dilated convolution

Deformable convolution

Attention-based Model

Others

Graph neural network-based method

Recurrent neural network-based method

Combining with detection

Loss Function

Euclidean Loss

SSIM Loss

Adversarial Loss

Multi-task Learning

Others

Supervisory Signal

Fully Supervised Learning

Weakly Supervised and Semi-supervised Learning

Unsupervised and Self-supervised Learning

Automatic Labeling through Synthetic Data

Conclusion and Future Directions

References