rixejzvdl649

Pinned Repositories

lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Language:Python1.8k 3 176139
LaViLa
Code release for "Learning Video Representations from Large Language Models"
Language:Python489 8 3645
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Language:Python1.2k 14 120107
VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language:Python212 5 2615
MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Language:Python363 14 159
DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Language:C++5.1k 92 1.6k619
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.4k 28 18485
VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language:Python510 5 5958
open_clip
An open source implementation of CLIP.
Language:Jupyter Notebook00
Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language:Python282 5 3011

rixejzvdl649's Repositories

rixejzvdl649/open_clip
An open source implementation of CLIP.
Language:Jupyter Notebook00