InternVideo2 [Paper]

The code and models for 'InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding' are scheduled to be released soon at this link. Please note that this repository will no longer receive updates or maintenance.

  • Achieved 92.1% Top1 accuracy in Kinetics 400.
  • Achieved SOTA performance on over 60 video/audio-related tasks (including action recognition, temporal localization, retrieval, etc) when released.

Updates

  • Mar 22, 2024: The technical report of InternVideo2 is released.