/omnivore

Omnivore: A Single Model for Many Visual Modalities

Primary LanguageJupyter NotebookOtherNOASSERTION

Omnivorous model architectures for Image and Video classfication and SSL

This repository contains PyTorch pretrained models, inference examples for the following papers:

Omnivore A single vision model for many different visual modalities, CVPR 2022 [bib]
@inproceedings{girdhar2022omnivore,
  title={{Omnivore: A Single Model for Many Visual Modalities}},
  author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan},
  booktitle={CVPR},
  year={2022}
}
OmniMAE Single Model Masked Pretraining on Images and Videos [bib]
@inproceedings{girdhar2022omnivore,
  title={{OmniMAE: Single Model Masked Pretraining on Images and Videos}},
  author={Girdhar, Rohit and El-Nouby Alaa and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
  booktitle={TODO},
  year={2022}
}

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more information.

License

Omnivore is released under the CC-BY-NC 4.0 license. See LICENSE for additional details. However the Swin Transformer implementation is additionally licensed under the Apache 2.0 license (see NOTICE for additional details).