NROwind's Stars
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
315386775/DeepLearing-Interview-Awesome-2024
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
salesforce/ALBEF
Code for ALBEF: a new vision-language pre-training method
datawhalechina/tiny-universe
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
multimodal-art-projection/MAP-NEO
mindspore-courses/step_into_llm
MindSpore online courses: Step into LLM
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
yfzhang114/Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
daixiangzi/Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
liguopeng0923/UCVGL
[CVPR 2024🔥] Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
OpenGVLab/LCL
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
PromptExpert/blogs
zhourax/VEGA
fawazsammani/awesome-vision-language-pretraining
Awesome Vision-Language Pretraining Papers
FeipengMa6/VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
HKUST-LongGroup/CoMM
Official repository for CoMM Dataset
Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
xihuai18/arxiv-sanity-x