/mae_st

Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"

Primary LanguagePythonOtherNOASSERTION

Masked Autoencoders As Spatiotemporal Learners: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders As Spatiotemporal Learners:

@Article{MaskedAutoencodersSpatiotemporal2022,
  author  = {Christoph Feichtenhofer and Haoqi Fan and Yanghao Li and Kaiming He},
  journal = {arXiv:2205.09113},
  title   = {Masked Autoencoders As Spatiotemporal Learners},
  year    = {2022},
}

Another implementation that supports AVA and SSv2 downstream evaluation is available in PySlowFast.

  • This repo is a modification on the MAE repo. Installation and preparation follow INSTALL.md.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code + testing code
  • Pre-training code

Visualization demo

Visualization of MAE output with 95% (left) and 98% (right) mask rate on the same video.

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, pretrained with 90% mask ratio and 1600 effective epochs, converted from the PySlowFast codebase:

ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-400 download download
md5 edf3a5 3d7f64
ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-600 download download
md5 9a9645 27495e
ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-700 download download
md5 cdbada 4c4e3c

The fine-tuning instruction is in FINETUNE.md.

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.