SimonCuCu's Stars
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Zanette-Labs/SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
mnoukhov/async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
YuxiXie/MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
sail-sg/CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
ZHZisZZ/weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
rohinmanvi/Capability-Aware_and_Mid-Generation_Self-Evaluations
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Xwin-LM/Xwin-LM
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
huggingface/trl
Train transformer language models with reinforcement learning.
microsoft/DeepSpeedExamples
Example models using DeepSpeed