InternVideo2 [Paper]
The code and models for 'InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding' are scheduled to be released soon at this link. Please note that this repository will no longer receive updates or maintenance.
- Achieved
92.1%
Top1 accuracy in Kinetics 400. - Achieved
SOTA
performance on over60
video/audio-related tasks (including action recognition, temporal localization, retrieval, etc) when released.
Mar 22, 2024
: The technical report of InternVideo2 is released.