Unoffical re-implementation of Event Sequence Selection Network (ESGN) in paper titled "streamlined dense video captioning". Note that we do not adopt SST to encode the proposal-level features, which is different from the original model.
- Python 3.6.2
- CUDA 10.0, PyTorch 1.2.0 (may work on other versions but has not been tested)
- other modules, run
pip install -r requirement.txt
-
C3D feature. Download C3D feature files (
sub_activitynet_v1-3.c3d.hdf5
) from here. Convert the h5 file into npy files and place them into./data/c3d
. -
Download annotation files and pre-generated proposals files (top100 proposals generated by DBG) from Google Drive, and place them into
./data
.
- Training
cfg_path=cfgs/esgn.yml
python train.py --cfg_path $cfg_path
the checkpoint files are saved in this folder ./save
.
- Validation
python eval.py --eval_folder esgn_c3d_run0
- Validation with re-ranking
python eval.py --eval_folder esgn_c3d_run0 --eval_esgn_rerank
Model | proposal model | Avg proposal number | Avg Recall | Avg Precision | F1 | download |
---|---|---|---|---|---|---|
Original ESGN | SST | 2.85 | 55.58 | 57.57 | 56.66 | |
My reimpl. | DBG | 2.73 | 52.67 | 58.90 | 55.62 | url |
My reimpl. with reranking | DBG | 1.66 | 37.66 | 67.47 | 48.33 |
Download the pre-trained model and put it into ./save/esgn_c3d_run0
, then run python eval.py --eval_folder esgn_c3d_run0
.
- Awesome ImageCaptioning.pytorch project.
- Official implementation of "Weakly Supervised Dense Event Captioning in Videos".