Event Sequence Generation Network

Unoffical re-implementation of Event Sequence Selection Network (ESGN) in paper titled "streamlined dense video captioning". Note that we do not adopt SST to encode the proposal-level features, which is different from the original model.

Environment

Python 3.6.2
CUDA 10.0, PyTorch 1.2.0 (may work on other versions but has not been tested)
other modules, run pip install -r requirement.txt

Prerequisites

C3D feature. Download C3D feature files (sub_activitynet_v1-3.c3d.hdf5) from here. Convert the h5 file into npy files and place them into ./data/c3d.
Download annotation files and pre-generated proposals files (top100 proposals generated by DBG) from Google Drive, and place them into ./data.

Usage

Training

cfg_path=cfgs/esgn.yml
python train.py --cfg_path $cfg_path

the checkpoint files are saved in this folder ./save.

Validation

python eval.py --eval_folder esgn_c3d_run0

Validation with re-ranking

python eval.py --eval_folder esgn_c3d_run0 --eval_esgn_rerank

Performance

Model	proposal model	Avg proposal number	Avg Recall	Avg Precision	F1	download
Original ESGN	SST	2.85	55.58	57.57	56.66
My reimpl.	DBG	2.73	52.67	58.90	55.62	url
My reimpl. with reranking	DBG	1.66	37.66	67.47	48.33

Pretrained model

Download the pre-trained model and put it into ./save/esgn_c3d_run0, then run python eval.py --eval_folder esgn_c3d_run0.

References

Awesome ImageCaptioning.pytorch project.
Official implementation of "Weakly Supervised Dense Event Captioning in Videos".

ttengwang/ESGN