Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos).
IG-65M activations for the Primer movie trailer video; time goes top to bottom
IG-65M video deep dream: maximizing activations; for more see this pull request
The following describes how to use the model in your own project and how to use our conversion and extraction tools.
We provide convenient PyTorch Hub integration
>>> import torch
>>>
>>> torch.hub.list("moabitcoin/ig65m-pytorch")
['r2plus1d_34_32_ig65m', 'r2plus1d_34_32_kinetics', 'r2plus1d_34_8_ig65m', 'r2plus1d_34_8_kinetics']
>>>
>>> model = torch.hub.load("moabitcoin/ig65m-pytorch", "r2plus1d_34_32_ig65m", num_classes=359, pretrained=True)
We build and publish Docker images (see all tags) via Travis CI/CD for master and for all releases.
In these images we provide the following tools:
convert
- to convert Caffe2 blobs to PyTorch model and weightsextract
- to compute clip features for a video with a pre-trained modelsemcode
- to visualize clip features for a video over timeindex-build
- to build an approximate nearest neighbor index from clip featuresindex-serve
- to load an approximate nearest neighbor index and serve queriesindex-query
- to make approximate nearest neighbor queries against an index server
Run these pre-built images via
docker run moabitcoin/ig65m-pytorch:latest-cpu --help
Example for running on CPUs:
docker run --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-cpu \
extract /data/myvideo.mp4 /data/myfeatures.npy
Example for running on GPUs via nvidia-docker:
docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
extract /data/myvideo.mp4 /data/myfeatures.npy
We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.
Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data
inside the container:
make
make run datadir=/Path/To/My/Videos
By default we build and run the CPU Docker images; for GPUs run:
make dockerfile=Dockerfile.gpu
make gpu
The WebcamDataset
requires exposing /dev/video0
to the container which will only work on Linux:
make
make webcam
We provide converted .pth
and .pb
PyTorch and ONNX weights, respectively.
Model | Pretrain+Finetune | Input Size | pth | onnx | caffe2 |
---|---|---|---|---|---|
R(2+1)D_34 | IG-65M + None | 8x112x112 | r2plus1d_34_clip8_ig65m_from_scratch-9bae36ae.pth | r2plus1d_34_clip8_ig65m_from_scratch-748ab053.pb | r2plus1d_34_clip8_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 8x112x112 | r2plus1d_34_clip8_ft_kinetics_from_ig65m-0aa0550b.pth | r2plus1d_34_clip8_ft_kinetics_from_ig65m-625d61b3.pb | r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl |
R(2+1)D_34 | IG-65M + None | 32x112x112 | r2plus1d_34_clip32_ig65m_from_scratch-449a7af9.pth | r2plus1d_34_clip32_ig65m_from_scratch-e304d648.pb | r2plus1d_34_clip32_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 32x112x112 | r2plus1d_34_clip32_ft_kinetics_from_ig65m-ade133f1.pth | r2plus1d_34_clip32_ft_kinetics_from_ig65m-10f4c3bf.pb | r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl |
Notes
- ONNX models provided here have not been optimized for inference.
- Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 359 (32 clips), and 487 (8 clips) classes.
- For models fine-tuned on Kinetics you can use the labels from here.
- For plain IG65 models there is no label map available.
- Official Facebook Research Caffe2 models are here.
- D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
- D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
- D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
- VMZ: Model Zoo for Video Modeling
- Kinetics & IG-65M
Copyright © 2019 MoabitCoin
Distributed under the MIT License (MIT).