Source code for the following paper(arXiv link):
Context-aware RCNNs: a Baseline for Action Detection in Videos
Jianchao Wu, Zhanghui Kuang, Limin Wang, Wayne Zhang, Gangshan Wu
in ECCV 2020
Our implementation is based on Video-long-term-feature-banks.
Please follow LFB on how to prepare AVA dataset.
Please follow LFB on how to prepare Caffe2 environment.
Please download R50-I3D-NL, and put it in [code root]/pretrained_weights folder.
Run:
bash train_baseline.sh configs/avabox_r50_baseline_32x2_scale1_5.yaml
Run:
bash train_baseline.sh configs/avabox_r50_baseline_16x4_scale1_5_withScene.yaml
Stage1. Train a baseline model that will be used to infer LFB:
bash train_baseline.sh configs/avabox_r50_baseline_16x4_scale1_5.yaml
Stage2. Train a model with scene feature and LFB:
bash train_lfb.sh configs/avabox_r50_lfb_win60_L3_16x4_withScene.yaml [path to baseline model weight from step1]