GoGiants1's Stars
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
allenai/open-instruct
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
ytongbai/LVM
HKUST-LongGroup/CoMM
Official repository for CoMM Dataset
xichenpan/Kosmos-G
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
AILab-CVC/SEED-X
Multimodal Models in Real World
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
apple/ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
lucidrains/LVMAE-pytorch
Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
yzhaiustc/Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
KellerJordan/modded-nanogpt
NanoGPT (124M) in 5 minutes
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
PKU-YuanGroup/LLaVA-CoT
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
ys-zong/VL-ICL
Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
UW-Madison-Lee-Lab/CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
leloykun/mmsg
Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
ostris/ai-toolkit
Various AI scripts. Mostly Stable Diffusion stuff.
baaivision/Emu3
Next-Token Prediction is All You Need