CMU Video-Level Semantic Features
This repository contains the inferencing code and models for the following papers:
Po-Yao Huang, Ye Yuan, Zhenzhong Lan, Lu Jiang, and Alexander G. Hauptmann
"Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification"
in arXiv preprint arXiv:1707.01408 (2017).
Jia Chen, Junwei Liang, Jiang Liu, Shizhe Chen, Chenqiang Gao, Qin Jin, and Alexander Hauptmann.
"Informedia@ Trecvid 2017."
Dependencies
- Python 2.7; TensorFlow >= 1.4.0; tqdm and nltk (for preprocessing)
- Pre-trained models (1.1GB). Extract the models into path models/
What the code does
Given a list of videos, output the semantic features for each video.
Inferencing
- Extract frames from each video
$ python extractFrames_uniform.py videos.lst frames_path --num_per_video 30
- Extract frame-level CNN features
First change the slimpath in img2feat_utils.py
$ python img2feat.py frames_path.lst inception_resnet_v2 models/inception_resnet_v2.ckpt frame_feature_path --l2norm --batchSize 30
- Average pooling into video-level CNN features
$ python averageFeats.py frame_feature_path.lst video_feature_path --l2norm
- Extract semantic features
First change the modelpath in semantics_features.py
$ python semantics_features.py video_feature_path.lst semantic_feat --save_seperate