Slicing Convolutional Neural Network for Crowd Video Understanding

This is the source code for "Slicing Convolutional Neural Network for Crowd Video Understanding". It aims at learning generic spatio-temporal features from crowd videos, especially for long-term temporal learning (i.e. 100 frames).

Overview

Three-branch Slicing CNN model (i.e. xy-, xt-, and yt-branch)

Crowd attribute recognition (i.e. 94 crowd-related attributes)

Project Site

Caffe

A fork of the well-known Caffe framework with Multi-GPU training and Dimension Swap layer.

Apart from the official installation prerequisites, we have several other dependencies:

Install openmpi to allow multi-gpu running
Python packages (e.g. numpy, scipy, scikit-image, etc.)
Add export PYTHONPATH="[path_python_layer]:$PYTHONPATH" to ~/.bashrc and restart the terminal. Here [path_python_layer] indicates the absolute path of the python script of py_dim_swap_layer.py.

Get the Caffe code

git clone --recursive https://github.com/amandajshao/Slicing-CNN.git

Files

Dataset

The dataset is introduced in CVPR 2015 which contains 10,000 crowd videos from 8,257 different crowded scenes with annotated 94 attributes.
LMDB Data

The LMDB data used in the model with training/validation/test splits.
CNN Initial Model

The initial model (VGG-16) is pre-trained on UCF-101 action dataset (single frame) and fine-tuned on WWW dataset (single frame).

Dropbox link

BaiduDisk link: http://pan.baidu.com/s/1jH5VLNw (password: 76zl)
CNN Best Model

Three models: SCNN-xy, SCNN-xt, SCNN-yt.

Dropbox link

BaiduDisk link: http://pan.baidu.com/s/1pK7h5sJ (password: j024)
Prototxt

The prototxts are corresponding to the above three models (SCNN-xy/-xt/-yt).

Dropbox link

BaiduDisk link: http://pan.baidu.com/s/1o85xUI2 (password: mwvo)
Scripts

There are two scripts provided in our code: model_run.sh and extract_features.sh.

Related Projects

Deeply Learned Attributes for Crowd Scene Understanding

Thanks

Citation

J. Shao, C. C. Loy, K. Kang, and X. Wang. Slicing Convolutional Neural Network for Crowd Video Understanding. Computer Vision and Pattern Recognition (CVPR), 2016.

@article{shao2016scnn,
  title={Slicing Convolutional Neural Network for Crowd Video Understanding},
  author={Shao, Jing and Loy, Chen Change and Kang, Kai and Wang, Xiaogang},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2016}
}