Context-aware RCNNs: a Baseline for Action Detection in Videos

Source code for the following paper(arXiv link):

Context-aware RCNNs: a Baseline for Action Detection in Videos
Jianchao Wu, Zhanghui Kuang, Limin Wang, Wayne Zhang, Gangshan Wu
in ECCV 2020

Our implementation is based on Video-long-term-feature-banks.

Prepare dataset

Please follow LFB on how to prepare AVA dataset.

Please follow LFB on how to prepare Caffe2 environment.

Please download R50-I3D-NL, and put it in [code root]/pretrained_weights folder.

Run:

bash train_baseline.sh configs/avabox_r50_baseline_32x2_scale1_5.yaml

Run:

bash train_baseline.sh configs/avabox_r50_baseline_16x4_scale1_5_withScene.yaml

Stage1. Train a baseline model that will be used to infer LFB:

bash train_baseline.sh configs/avabox_r50_baseline_16x4_scale1_5.yaml

Stage2. Train a model with scene feature and LFB:

bash train_lfb.sh configs/avabox_r50_lfb_win60_L3_16x4_withScene.yaml [path to baseline model weight from step1]