ChenDRAG's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
huggingface/trl
Train transformer language models with reinforcement learning.
lllyasviel/IC-Light
More relighting!
imoneoi/openchat
OpenChat: Advancing Open-source Language Models with Imperfect Data
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
openai/consistencydecoder
Consistency Distilled Diff VAE
OpenLLMAI/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
opendilab/awesome-diffusion-model-in-rl
A curated list of Diffusion Model in RL resources (continually updated)
lichao-sun/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
apexrl/Diff4RLSurvey
This repository contains a collection of resources and papers on Diffusion Models for RL, accompanying the paper "Diffusion Models for Reinforcement Learning: A Survey"
kvablack/ddpo-pytorch
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
yuvalkirstain/PickScore
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
jannerm/ddpo
Code for the paper "Training Diffusion Models with Reinforcement Learning"
OpenBMB/UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
SalesforceAIResearch/DiffusionDPO
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
OpenBMB/Eurus
mihirp1998/AlignProp
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
alibaba/VideoMV
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model
thu-ml/Noise-Contrastive-Alignment
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
somvy/slic-hf
Experiments of divergence functions for DPO, RLHF
Ghost---Shadow/sequence-likelihood-calibration
Reproduction of SLiC-HF: Sequence Likelihood Calibration with Human Feedback