jiaojile

jiaojile's Stars

jshilong/GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Language:Python52126
FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Language:Python58861
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python13k908
csbobby/SOK-Bench
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Language:Shell5
shikras/shikra
Language:Python75345
Ucas-HaoranWei/Vary
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Language:Python1.8k161
csbobby/STAR_Benchmark
Language:Python292
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
Language:Jupyter Notebook3k280
OpenGVLab/all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
Language:Python47117
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Language:Python13.6k1.6k
LLaVA-VL/LLaVA-NeXT
Language:Python3.2k280
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language:Python81962
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Language:Python40.9k5.2k
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.5k92
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Language:Python90344
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python7.1k717
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook15.5k1.4k
LetheSec/HuggingFace-Download-Accelerator
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
Language:Python90083
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python3.1k254
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Language:Python4.1k324
DLYuanGod/TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Language:Python1.3k76
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language:Python2.9k265
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language:Python3.1k219
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20.9k2.3k
Meituan-AutoML/MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
Language:Python1.1k70
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Language:Python1.2k151
codefuse-ai/CodeFuse-MFT-VLM
Language:Python369