hn18001's Stars
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
InternLM/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
hustvl/Vim
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
X-PLUG/mPLUG-Owl
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
ytongbai/LVM
Ucas-HaoranWei/Vary
Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
sczhou/Upscale-A-Video
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
dvlab-research/LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
allenai/unified-io-2
csuhan/OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
hkproj/pytorch-transformer
Attention is all you need implementation
fh2019ustc/DocTr-Plus
The official code for “Deep Unrestricted Document Image Rectification”, TMM, 2023.
X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
jiawen-zhu/ViPT
[CVPR23] Visual Prompt Multi-Modal Tracking
tsb0601/MMVP
GuHuangAI/DiffusionEdge
Code for AAAI 2024 paper: "DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection"
ZYM-PKU/UDiffText
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
lzw-lzw/LEGO
LEGO:Language-Enhanced Multi-modal Grounding Model
mu-cai/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
huangb23/VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
PKU-YuanGroup/Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
zhang-zx/AVID
This respository contains the code for AVID: Any-Length Video Inpainting with Diffusion Model.
NevSNev/FGDVI
Flow-Guided Diffusion for Video Inpainting