cheng221's Stars
Tencent/MimicMotion
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
deepseek-ai/DeepSeek-R1
km1994/LLMs_interview_notes
该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题
deepseek-ai/DeepSeek-VL2
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
ikatyang/emoji-cheat-sheet
A markdown version emoji cheat sheet
NVlabs/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
showlab/Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
baaivision/Emu3
Next-Token Prediction is All You Need
instantX-research/Regional-Prompting-FLUX
Training-free Regional Prompting for Diffusion Transformers 🔥
gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
kvablack/ddpo-pytorch
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
PaddlePaddle/PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
sihyun-yu/REPA
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Xnhyacinth/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
shallowdream204/DreamClear
[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
QwenLM/Qwen2.5-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
gerdm/prml
Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop
cure-lab/PnPInversion
[ICLR2024] Official repo for paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"
ohayonguy/PMRF
Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
EternalEvan/FlowIE
This repository contains the official implementation of "FlowIE: Efficient Image Enhancement via Rectified Flow"
IDKiro/sdxs
Official repo of our paper "SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions"