R3-Transformer

Installation

All dependencies are included in the original model's container. First install the latest docker. Then pull our docker image by:

docker pull hassanhub/vid_cap:latest

Then run the container by:

docker run --gpus all --name r3_container -it -v /home/

Note: This image already includes CUDA-related drivers and dependencies.

Alternatively, you can create your own environment and make sure the following dependencies are installed:

In order to speed-up data infeed, we utilize a multi-chunk hdf5 format. There are two options for getting data prepared for train/evaluation.

Download pre-extracted features using SlowFast-50-8x8 pre-trained on Kinetics 400 from this link:

Alternatively, you can follow these steps to extract a customized version of features using your own visual backbone:

Download YouCook II
Download ActivityNet Captions
Pre-process raw video files using this script
Extract visual features using your visual backbone or our pre-trained SlowFast-50-8x8 using this script
Store features and captions in a multi-chunk hdf5 format using this script