This repository contains code for reproduction and subsequent analysis of the paper titled "Am I done? Predicting action progress in videos". This reproduction is carried out as part of a course project for the course Deep Learning.
The full description of the reproduction and our findings can be found at Predicting action progress in videos - paper reproduction
- Clone our fork of the original faster R-CNN repository
git clone https://github.com/gsotirchos/realtime-action-detection
- Download the pre-trained faster R-CNN model
(
rgb-ssd300_ucf24_120000.pth
) from the authors' google drive, or withgdown
gdown https://drive.google.com/uc?id=1IyqjUQofRyYrAQ-Uz7MPsSqVBlBe-Zk7
- Download the UCF24 dataset used in the faster R-CNN paper from the same
google drive link, or with
gdown
gdown https://drive.google.com/uc?id=1o2l6nYhd-0DDXGP-IPReBP4y1ffVmGSE
- Place both the pre-trained model called
rgb-ssd300_ucf24_120000.pth
and the downloaded .tar.gz file in therealtime-action-detection
directory, and extract the dataset tarballmv rgb-ssd300_ucf24_120000.pth realtime-action-detection/ tar -xvf ucf24.tar.gz mv ucf24 realtime-action-detection/
- Outside the
realtime-action-detection
directory, clone this repo:git clone https://github.com/anishhdiwan/am-i-done
- Run
main.py
inside theam-i-done
directory to train the LSTM model from the reproduced paper OR runload_faster_r_cnn.py
inside therealtime-action-detection
directory to get frame-wise faster R-CNN detectionscd am-i-done python3 main.py # OR cd realtime-action-detection python3 load_faster_r_cnn.py
Singh G, Saha S, Sapienza M, Torr PH, Cuzzolin F. Online real-time multiple spatiotemporal action localisation and prediction. In Proceedings of the IEEE International Conference on Computer Vision 2017 (pp. 3637-3646).