IG-65M PyTorch
Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos).
IG-65M activations for the Primer movie trailer video; time goes top to bottom
IG-65M video deep dream: maximizing activations; for more see this pull request
💻
Usage The following describes how to use the model in your own project and how to use our conversion and extraction tools.
PyTorch Models
We provide convenient PyTorch Hub integration
>>> import torch
>>>
>>> torch.hub.list("moabitcoin/ig65m-pytorch")
['r2plus1d_34_32_ig65m', 'r2plus1d_34_32_kinetics', 'r2plus1d_34_8_ig65m', 'r2plus1d_34_8_kinetics']
>>>
>>> model = torch.hub.load("moabitcoin/ig65m-pytorch", "r2plus1d_34_32_ig65m", num_classes=359, pretrained=True)
Tools
We build and publish Docker images (see all tags) via Travis CI/CD for master and for all releases.
In these images we provide the following tools:
convert
- to convert Caffe2 blobs to PyTorch model and weightsextract
- to compute clip features for a video with a pre-trained modelsemcode
- to visualize clip features for a video over timeindex-build
- to build an approximate nearest neighbor index from clip featuresindex-serve
- to load an approximate nearest neighbor index and serve queriesindex-query
- to make approximate nearest neighbor queries against an index server
Run these pre-built images via
docker run moabitcoin/ig65m-pytorch:latest-cpu --help
Example for running on CPUs:
docker run --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-cpu \
extract /data/myvideo.mp4 /data/myfeatures.npy
Example for running on GPUs via nvidia-docker:
docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
extract /data/myvideo.mp4 /data/myfeatures.npy
Development
We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.
Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data
inside the container:
make
make run datadir=/Path/To/My/Videos
By default we build and run the CPU Docker images; for GPUs run:
make dockerfile=Dockerfile.gpu
make gpu
The WebcamDataset
requires exposing /dev/video0
to the container which will only work on Linux:
make
make webcam
🏆
PyTorch and ONNX Models We provide converted .pth
and .pb
PyTorch and ONNX weights, respectively.
Model | Pretrain+Finetune | Input Size | pth | onnx | caffe2 |
---|---|---|---|---|---|
R(2+1)D_34 | IG-65M + None | 8x112x112 | r2plus1d_34_clip8_ig65m_from_scratch-9bae36ae.pth | r2plus1d_34_clip8_ig65m_from_scratch-748ab053.pb | r2plus1d_34_clip8_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 8x112x112 | r2plus1d_34_clip8_ft_kinetics_from_ig65m-0aa0550b.pth | r2plus1d_34_clip8_ft_kinetics_from_ig65m-625d61b3.pb | r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl |
R(2+1)D_34 | IG-65M + None | 32x112x112 | r2plus1d_34_clip32_ig65m_from_scratch-449a7af9.pth | r2plus1d_34_clip32_ig65m_from_scratch-e304d648.pb | r2plus1d_34_clip32_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 32x112x112 | r2plus1d_34_clip32_ft_kinetics_from_ig65m-ade133f1.pth | r2plus1d_34_clip32_ft_kinetics_from_ig65m-10f4c3bf.pb | r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl |
Notes
- ONNX models provided here have not been optimized for inference.
- Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 359 (32 clips), and 487 (8 clips) classes.
- For models fine-tuned on Kinetics you can use the labels from here.
- For plain IG65 models there is no label map available.
- Official Facebook Research Caffe2 models are here.
📖
References - D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
- D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
- D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
- VMZ: Model Zoo for Video Modeling
- Kinetics & IG-65M
License
Copyright © 2019 MoabitCoin
Distributed under the MIT License (MIT).