SimonCuCu

SimonCuCu's Stars

FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.2k554
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python32.9k5k
Zanette-Labs/SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
Language:Python353
mnoukhov/async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
Language:Python191
YuxiXie/MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
Language:Jupyter Notebook25626
sail-sg/CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
Language:Python802
ZHZisZZ/weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Language:Python553
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.4k164
rohinmanvi/Capability-Aware_and_Mid-Generation_Self-Evaluations
18
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
Language:Python4.8k419
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python2.3k188
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Language:Python61563
Xwin-LM/Xwin-LM
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
Language:Python1k41
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
Language:Python46951
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language:Python3.4k323
huggingface/trl
Train transformer language models with reinforcement learning.
Language:Python10.5k1.4k
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6.2k1.1k