This project tries to implement RecNet proposed on Reconstruction Network for Video Captioning in CVPR 2018.
- Ubuntu 16.04
- CUDA 9.0
- cuDNN 7.3.1
- Java 1.8
- Python 2.7.12
- PyTorch 1.0
- Other python libraries specified in requirements.txt
$ pip install virtualenv
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt
-
Extract feature vectors of datasets by following instructions in here, and locate them at
~/<dataset>/features/<network>.hdf5
e.g. InceptionV4 feature vectors of MSVD dataset will be located at
~/data/MSVD/features/InceptionV4.hdf5
. -
Set hyperparameters in
config.py
and split the dataset into train / val / test dataset by running following command.(.env) $ python -m scripts.split
- Set hyperparameters in
config.py
. - Run
(.env) $ python train.py
- Set hyperparameters in
config.py
. - Run
(.env) $ python run.py
NOTE: For now, only 2D features are used for evaluating our model (3D features are missing).
-
MSVD
BLEU4 METEOR CIDEr ROUGE_L Ours (wo. reconstructor) 39.4 27.2 37.8 61.8 Ours (global) 40.7 27.3 34.4 61.9 Ours (local) 35.3 27.3 35.2 61.9 Paper (global) 51.1 34.0 69.4 79.7 Paper (local) 52.3 34.1 69.8 80.7
-
MSR-VTT
BLEU4 METEOR CIDEr ROUGE_L Ours - - - - Paper (global) 38.3 26.2 59.1 41.7 Paper (local) 39.1 26.6 59.3 42.7
- Add qualitative results.
- Add C3D feature vectors.
- Add MSR-VTT dataset.