requirements

  • cuda
  • pytorch 0.3.1
  • python3(未测试) or python2(已测试,最好统一用py2吧)
  • ffmpeg (can install using anaconda)

usage

  1. 2d特征提取, 如resnet101, nasnet等
sh ./2d_extract_feat.sh
# model 模型选择
# n_frame_steps 一段视频提取多少帧,默认选80吧
  1. 3d特征提取
cd c3d_feat_extract
sh ./c3d_feat_extract.sh
# --mode feature 提取特征模式,无需改动
# 以下根据所选模型不同进行更改
# --model_name resnext \
# --model_depth 101 \
# --resnext_cardinality 32 \
# --resnet_shortcut B \
# --model pretrained_models/resnext-101-64f-kinetics.pth
  1. 训练
./train_s2vt.sh
# 根据相关配置进行设置,具体选项含义参考opts.py
  1. 测试和评分
./eval_s2vt.sh
# 根据相关配置进行设置,具体选项含义参考eval.py

file tree

相关文件下载 链接: https://pan.baidu.com/s/1RDNygrWtz_PtVH8nh4vG3w 密码: nxyk

data
│   all_caption.json
│   all_info.json    
│   all_videodatainfo_2017.json
└───feats
│   └───nasnet
│   │   │   videoxxx.npy
│   │   │   ...
│   └───resnet
│   │   │   videoxxx.npy
│   │   │   ... 
│   └───xxnet
│       │   videoxxx.npy
│       │   ... 
└───videos
│   │   videoxxx.mp4
│   │   ...
│
│
新建这些目录
log
checkpoint
result

pytorch implementation of video captioning

recommend installing pytorch and python packages using Anaconda

python packages

  • tqdm
  • pillow
  • pretrainedmodels
  • nltk

Data

MSR-VTT. Test video doesn't have captions, so I spilit train-viedo to train/val/test. Extract and put them in ./data/ directory

Options

all default options are defined in opt.py or corresponding code file, change them for your like.

Usage

(Optional) c3d features

you can use video-classification-3d-cnn-pytorch to extract features from video. Then mean pool to get a 2048 dim feature for each video.

Steps

  1. preprocess videos and labels

    this steps take about 3 hours for msr-vtt datasets use one titan XP gpu

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40  --gpu 4,5

python prepro_vocab.py
  1. Training a model
python train.py --gpu 5,6,7 --epochs 9001 --batch_size 450 --checkpoint_path data/save --feats_dir data/feats/resnet152 --dim_vid 2048 --model S2VTAttModel
  1. test

    opt_info.json will be in same directory as saved model.

python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_1000.pth --batch_size 100 --gpu 1,0

Metrics

I fork the coco-caption XgDuan. Thanks to port it to python3.

TODO

  • lstm
  • beam search
  • reinforcement learning

Note

This repository is not maintained, please see my another repository video-caption-openNMT.py. It has higher performence and test score.