shu-le's Stars
YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Shentao-YANG/Dense_Reward_T2I
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
CaraJ7/CoMat
[Neurips 2024] 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
mihirp1998/AlignProp
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
YBYBZhang/VideoElevator
[Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
UMass-Foundation-Model/3D-LLM
Code for 3D-LLM: Injecting the 3D World into Large Language Models
ChenHsing/Awesome-Video-Diffusion-Models
[CSUR] A Survey on Video Diffusion Models
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Mowenyii/PAE
[CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation
huggingface/trl
Train transformer language models with reinforcement learning.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
microsoft/DeepSpeedExamples
Example models using DeepSpeed
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
evalcrafter/EvalCrafter
[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
lixinustc/KVQ-Challenge-CVPR-NTIRE2024
The first challenge on short-form video quality assessment
haoningwu3639/StoryGen
[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
heheyas/V3D
V3D: Video Diffusion Models are Effective 3D Generators
TylerYep/torchinfo
View model summaries in PyTorch!
ExponentialML/Text-To-Video-Finetuning
Finetune ModelScope's Text To Video model using Diffusers 🧨
ali-vilab/VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
google/latexify_py
A library to generate LaTeX expression from Python code.
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF