hn18001

hn18001's Stars

openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Language:Python14.1k 261 1992.5k
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
9.4k 219 91626
InternLM/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
Language:Python5.3k 50 287379
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python5.2k 66 379369
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.2k 60 86465
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python3.9k 45 354305
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Language:Python3.5k 100 159239
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
Language:Jupyter Notebook2.4k 31 149156
hustvl/Vim
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Language:Python2.3k 30 79135
X-PLUG/mPLUG-Owl
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Language:Python2k 27 204156
ytongbai/LVM
1.6k 129 1849
Ucas-HaoranWei/Vary
Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Language:Python1.6k 52 107146
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Language:Python1.5k 21 8378
sczhou/Upscale-A-Video
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
816 84 728
dvlab-research/LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Language:Python593 11 9138
allenai/unified-io-2
Language:Python498 15 1525
csuhan/OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Language:Python465 11 1723
hkproj/pytorch-transformer
Attention is all you need implementation
Language:Jupyter Notebook391 9 18171
fh2019ustc/DocTr-Plus
The official code for “Deep Unrestricted Document Image Rectification”, TMM, 2023.
Language:Python357 33 1339
X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language:Python262 5 2611
jiawen-zhu/ViPT
[CVPR23] Visual Prompt Multi-Modal Tracking
Language:Python227 3 2615
tsb0601/MMVP
Language:Python211 9 186
GuHuangAI/DiffusionEdge
Code for AAAI 2024 paper: "DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection"
Language:Python173 2 1914
ZYM-PKU/UDiffText
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Language:Python170 7 913
lzw-lzw/LEGO
LEGO:Language-Enhanced Multi-modal Grounding Model
1546
mu-cai/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Language:Python148 7 117
huangb23/VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Language:Python128 2 115
PKU-YuanGroup/Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
Language:Python96 3 62
zhang-zx/AVID
This respository contains the code for AVID: Any-Length Video Inpainting with Diffusion Model.
95 15 42
NevSNev/FGDVI
Flow-Guided Diffusion for Video Inpainting
28 6 11