requirements

cuda
pytorch 0.3.1
python3(未测试) or python2(已测试,最好统一用py2吧)
ffmpeg (can install using anaconda)

usage

2d特征提取, 如resnet101, nasnet等

sh ./2d_extract_feat.sh
# model 模型选择
# n_frame_steps 一段视频提取多少帧，默认选80吧

3d特征提取

cd c3d_feat_extract
sh ./c3d_feat_extract.sh
# --mode feature 提取特征模式，无需改动
# 以下根据所选模型不同进行更改
# --model_name resnext \
# --model_depth 101 \
# --resnext_cardinality 32 \
# --resnet_shortcut B \
# --model pretrained_models/resnext-101-64f-kinetics.pth

训练

./train_s2vt.sh
# 根据相关配置进行设置，具体选项含义参考opts.py

测试和评分

./eval_s2vt.sh
# 根据相关配置进行设置，具体选项含义参考eval.py

file tree

相关文件下载链接: https://pan.baidu.com/s/1RDNygrWtz_PtVH8nh4vG3w 密码: nxyk

data
│   all_caption.json
│   all_info.json    
│   all_videodatainfo_2017.json
└───feats
│   └───nasnet
│   │   │   videoxxx.npy
│   │   │   ...
│   └───resnet
│   │   │   videoxxx.npy
│   │   │   ... 
│   └───xxnet
│       │   videoxxx.npy
│       │   ... 
└───videos
│   │   videoxxx.mp4
│   │   ...
│
│
新建这些目录
log
checkpoint
result

pytorch implementation of video captioning

recommend installing pytorch and python packages using Anaconda

python packages

tqdm
pillow
pretrainedmodels
nltk

Data

MSR-VTT. Test video doesn't have captions, so I spilit train-viedo to train/val/test. Extract and put them in ./data/ directory

train-video: download link
test-video: download link
json info of train-video: download link
json info of test-video: download link

Options

all default options are defined in opt.py or corresponding code file, change them for your like.

Usage

(Optional) c3d features

you can use video-classification-3d-cnn-pytorch to extract features from video. Then mean pool to get a 2048 dim feature for each video.

Steps

preprocess videos and labels

this steps take about 3 hours for msr-vtt datasets use one titan XP gpu

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40  --gpu 4,5

python prepro_vocab.py

Training a model

python train.py --gpu 5,6,7 --epochs 9001 --batch_size 450 --checkpoint_path data/save --feats_dir data/feats/resnet152 --dim_vid 2048 --model S2VTAttModel

test

opt_info.json will be in same directory as saved model.

python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_1000.pth --batch_size 100 --gpu 1,0

Metrics

I fork the coco-caption XgDuan. Thanks to port it to python3.

TODO

lstm
beam search
reinforcement learning

Note

This repository is not maintained, please see my another repository video-caption-openNMT.py. It has higher performence and test score.

Sundrops/video-caption.pytorch