Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos). The official Facebook Research Caffe2 model and weights are available here.
We provide converted .pth
and .pb
PyTorch and ONNX weights, respectively.
Model | Pretrain+Finetune | Input Size | pth | onnx | caffe2 |
---|---|---|---|---|---|
R(2+1)D_34 | IG-65M + None | 8x112x112 | r2plus1d_34_clip8_ig65m_from_scratch_9bae36ae.pth | r2plus1d_34_clip8_ig65m_from_scratch_748ab053.pb | r2plus1d_34_clip8_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 8x112x112 | r2plus1d_34_clip8_ft_kinetics_from_ig65m_0aa0550b.pth | r2plus1d_34_clip8_ft_kinetics_from_ig65m_625d61b3.pb | r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl |
R(2+1)D_34 | IG-65M + None | 32x112x112 | r2plus1d_34_clip32_ig65m_from_scratch_449a7af9.pth | r2plus1d_34_clip32_ig65m_from_scratch_e304d648.pb | r2plus1d_34_clip32_ig65m_from_scratch.pkl |
R(2+1)D_34 | IG-65M + Kinetics | 32x112x112 | r2plus1d_34_clip32_ft_kinetics_from_ig65m_ade133f1.pth | r2plus1d_34_clip32_ft_kinetics_from_ig65m_10f4c3bf.pb | r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl |
Notes
- ONNX models provided here have not been optimized for inference.
- Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 487 (32 clips), and 359 (8 clips) classes.
- For models fine-tuned on Kinetics you can use the labels from here.
- For plain IG65 models there is no label map available.
The following describes how to use the model in your own project and how to use our conversion and extraction tools.
- See
convert.py
and copy ther2plus1d_34
model architecture definition - See
exract.py
for how to load the corresponding weights into the model
Note: we require torchvision v0.4 or later for the model architecture building blocks
We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments.
Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data
inside the container:
make
make run datadir=/Path/To/My/Videos
By default we build and run the CPU Docker images; for GPUs run:
make dockerfile=Dockerfile.gpu
make gpu
The WebcamDataset
requires exposing /dev/video0
to the container which will only work on Linux:
make
make webcam
Build the docker image and get into the container as described above.
Then see the convert.py
tool's --help
and its source.
Build the docker image and get into the container as described above.
Then see the extract.py
tool's --help
and its source.
- D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
- D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
- D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
- VMZ: Model Zoo for Video Modeling
- Kinetics & IG-65M
Copyright © 2019 MoabitCoin
Distributed under the MIT License (MIT).