ViT models pretrained with up to ~5k hours of human-like video data
Primary LanguagePythonMIT LicenseMIT