OpenGVLab

General Vision Team of Shanghai AI Laboratory

Pinned Repositories

Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.2k 34 244259
DragGAN
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" （DragGAN 全功能实现，在线Demo，本地部署试用，代码、模型已全部开源，支持Windows, macOS, Linux）
Language:Python5k 65 113488
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language:Python3.2k 41 51230
InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Language:Python2.6k 34 271244
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.8k 27 241106
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python7.4k 57 844566
LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Language:Python5.8k 77 146379
OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python790 17 8860
VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Language:Python930 12 10371
VisionLLM
VisionLLM Series
Language:Python1k 41 1844

OpenGVLab's Repositories

OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python7.4k 57 844566
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.2k 34 244259
OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Language:Python2.6k 34 271244
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.8k 27 241106
OpenGVLab/VisionLLM
VisionLLM Series
Language:Python1k 41 1844
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python790 17 8860
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language:Python603 7 6770
OpenGVLab/Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Language:Python433 5 5418
OpenGVLab/VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Language:Python376 21 3611
OpenGVLab/OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Language:Python330 12 107
OpenGVLab/EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Language:Python261 4 2619
OpenGVLab/EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
Language:Jupyter Notebook124 1 194
OpenGVLab/Hulk
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
Language:Python122 4 225
OpenGVLab/MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Language:Python113 2 25
OpenGVLab/GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
Language:Python96 3 135
OpenGVLab/PhyGenBench
The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Language:Python95 0 41
OpenGVLab/PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
Language:Python86 5 52
OpenGVLab/InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Language:Jupyter Notebook82 1 46
OpenGVLab/STM-Evaluation
Language:Python71 7 16
OpenGVLab/LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Language:Python68 1 44
OpenGVLab/vinci
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Language:Python49 1 52
OpenGVLab/TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Language:Python45 1 12
OpenGVLab/PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Language:Python33 0 20
OpenGVLab/V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Language:Python31 1 01
OpenGVLab/TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Language:Python28 3 31
OpenGVLab/Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Language:Python200
OpenGVLab/FluxViT
Make Your Training Flexible: Towards Deployment-Efficient Video Models
12
OpenGVLab/OV-OAD
This repo takes the initial step towards leveraging text learning for online action detection without explicit human supervision.
1 0 0
OpenGVLab/.github
0 2 01
OpenGVLab/VLMEvalKit_InternVL2_5
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python