reconstruction-network-for-video-captioning

This project tries to implement RecNet proposed on Reconstruction Network for Video Captioning in CVPR 2018.

Requirements

Ubuntu 16.04
CUDA 9.0
cuDNN 7.3.1
Java 1.8
Python 2.7.12
- PyTorch 1.0
- Other python libraries specified in requirements.txt

$ pip install virtualenv
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Extract feature vectors of datasets by following instructions in here, and locate them at ~/<dataset>/features/<network>.hdf5

e.g. InceptionV4 feature vectors of MSVD dataset will be located at ~/data/MSVD/features/InceptionV4.hdf5.
Set hyperparameters in config.py and split the dataset into train / val / test dataset by running following command.
```
(.env) $ python -m scripts.split
```

NOTE: For now, only 2D features are used for evaluating our model (3D features are missing).

MSVD

	BLEU4	METEOR	CIDEr	ROUGE_L
Ours (wo. reconstructor)	39.4	27.2	37.8	61.8
Ours (global)	40.7	27.3	34.4	61.9
Ours (local)	35.3	27.3	35.2	61.9
Paper (global)	51.1	34.0	69.4	79.7
Paper (local)	52.3	34.1	69.8	80.7

	BLEU4	METEOR	CIDEr	ROUGE_L
Ours	-	-	-	-
Paper (global)	38.3	26.2	59.1	41.7
Paper (local)	39.1	26.6	59.3	42.7