R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

By Huijuan Xu, Abir Das and Kate Saenko (Boston University).

Introduction

We propose a fast end-to-end Region Convolutional 3D Network (R-C3D) for activity detection in continuous video streams. The network encodes the frames with fully-convolutional 3D filters, proposes activity segments, then classifies and refines them based on pooled features within their boundaries.

License

R-C3D is released under the MIT License (refer to the LICENSE file for details).

Citing R-C3D

If you find R-C3D useful in your research, please consider citing:

@inproceedings{Xu2017iccv,
    title = {R-C3D: Region Convolutional 3D Network for Temporal Activity Detection},
    author = {Huijuan Xu and Abir Das and Kate Saenko},
    booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
    year = {2017}
}

We build this repo based on Faster R-CNN, C3D and ActivityNet dataset. Please cite the following papers as well:

Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in neural information processing systems, pp. 91-99. 2015.

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3d convolutional networks." In Proceedings of the IEEE international conference on computer vision, pp. 4489-4497. 2015.

Caba Heilbron, Fabian, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. "Activitynet: A large-scale video benchmark for human activity understanding." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961-970. 2015.

Installation
Preparation
Training
Testing

Installation:

Clone the R-C3D repository.

git clone --recursive git@github.com:VisionLearningGroup/R-C3D.git

Build Caffe3d with pycaffe (see: Caffe installation instructions).

Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe

Build R-C3D lib folder.
```
cd ./lib    
make
```

Preparation:

Download the ground truth annatations and videos in ActivityNet dataset.

cd ./preprocess/activityNet/

# Download the groud truth annotations in ActivityNet dataset.
wget http://ec2-52-11-11-89.us-west-2.compute.amazonaws.com/files/activity_net.v1-3.min.json

# Download the videos in ActivityNet dataset into ./preprocess/activityNet/videos.
python download_video.py

Extract frames from downloaded videos in 25 fps.

# training video frames are saved in ./preprocess/activityNet/frames/training/
# validation video frames are saved in ./preprocess/activityNet/frames/validation/ 
python generate_frames.py

Generate the pickle data for training and testing R-C3D model.

# generate training data
python generate_roidb_training.py
# generate validation data
python generate_roidb_validation.py

Training:

Download C3D classification pretrain model to ./pretrain/ .

The C3D model weight pretrained on Sports1M and finetuned on ActivityNet dataset is provided in: caffemodel .

In R-C3D root folder, run:

./experiments/activitynet/script_train.sh

Testing:

Download one sample R-C3D model to ./snapshot/ .

One R-C3D model on ActivityNet dataset is provided in: caffemodel .

The provided R-C3D model has the Average-mAP 14.4% on the validation set.
In R-C3D root folder, generate the prediction log file on the validation set.
```
./experiments/activitynet/test/script_test.sh
```

Generate the results.json file from the prediction log file.

cd ./experiments/activitynet/test
python activitynet_log_analysis.py test_log_<iters>.txt.*

Get the detection evaluation result.

cd ./experiments/activitynet/test/Evaluation
python get_detection_performance.py data/activity_net.v1-3.min.json ../results.json

ivyvideo/R-C3D