Pinned Repositories
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
DragGAN
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
VisionLLM
VisionLLM Series
OpenGVLab's Repositories
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
OpenGVLab/VisionLLM
VisionLLM Series
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
OpenGVLab/Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
OpenGVLab/VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
OpenGVLab/OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
OpenGVLab/EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
OpenGVLab/EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
OpenGVLab/Hulk
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
OpenGVLab/MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
OpenGVLab/GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
OpenGVLab/PhyGenBench
The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
OpenGVLab/PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
OpenGVLab/InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
OpenGVLab/STM-Evaluation
OpenGVLab/LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
OpenGVLab/vinci
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
OpenGVLab/TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
OpenGVLab/PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
OpenGVLab/V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
OpenGVLab/TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
OpenGVLab/Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
OpenGVLab/FluxViT
Make Your Training Flexible: Towards Deployment-Efficient Video Models
OpenGVLab/OV-OAD
This repo takes the initial step towards leveraging text learning for online action detection without explicit human supervision.
OpenGVLab/.github
OpenGVLab/VLMEvalKit_InternVL2_5
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks