
PyTorch 3D video classification models pre-trained on 65 million Instagram videos

Primary LanguagePythonMIT LicenseMIT

IG-65M PyTorch

Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos). The official Facebook Research Caffe2 model and weights are available here.

PyTorch and ONNX Models 🏆

We provide converted .pth and .pb PyTorch and ONNX weights, respectively.

Model Pretrain+Finetune Input Size pth onnx caffe2
R(2+1)D_34 IG-65M + None 8x112x112 r2plus1d_34_clip8_ig65m_from_scratch_9bae36ae.pth r2plus1d_34_clip8_ig65m_from_scratch_748ab053.pb r2plus1d_34_clip8_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 8x112x112 r2plus1d_34_clip8_ft_kinetics_from_ig65m_0aa0550b.pth r2plus1d_34_clip8_ft_kinetics_from_ig65m_625d61b3.pb r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl
R(2+1)D_34 IG-65M + None 32x112x112 r2plus1d_34_clip32_ig65m_from_scratch_449a7af9.pth r2plus1d_34_clip32_ig65m_from_scratch_e304d648.pb r2plus1d_34_clip32_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 32x112x112 r2plus1d_34_clip32_ft_kinetics_from_ig65m_ade133f1.pth r2plus1d_34_clip32_ft_kinetics_from_ig65m_10f4c3bf.pb r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl


  • ONNX models provided here have not been optimized for inference.
  • Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 487 (32 clips), and 359 (8 clips) classes.
  • For models fine-tuned on Kinetics you can use the labels from here.
  • For plain IG65 models there is no label map available.

Usage 💻

The following describes how to use the model in your own project and how to use our conversion and extraction tools.

In Your Own Project

  • See convert.py and copy the r2plus1d_34 model architecture definition
  • See exract.py for how to load the corresponding weights into the model

Note: we require torchvision v0.4 or later for the model architecture building blocks

Development and Tools

We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.

Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data inside the container:

make run datadir=/Path/To/My/Videos

By default we build and run the CPU Docker images; for GPUs run:

make dockerfile=Dockerfile.gpu
make gpu

The WebcamDataset requires exposing /dev/video0 to the container which will only work on Linux:

make webcam

Convert Weights 🍝

Build the docker image and get into the container as described above. Then see the convert.py tool's --help and its source.

Extract Features 🍪

Build the docker image and get into the container as described above. Then see the extract.py tool's --help and its source.

References 📖

  1. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
  2. D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
  3. D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
  4. VMZ: Model Zoo for Video Modeling
  5. Kinetics & IG-65M


Copyright © 2019 MoabitCoin

Distributed under the MIT License (MIT).