ChangyuChen347's Stars
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
hendrycks/test
Measuring Massive Multitask Language Understanding | ICLR 2021
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
hsiehjackson/RULER
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
lmarena/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
corl-team/CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
LiveBench/LiveBench
LiveBench: A Challenging, Contamination-Free LLM Benchmark
p-lambda/dsir
DSIR large-scale data selection framework for language model training
Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
TIGER-AI-Lab/MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
openpsi-project/ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Vance0124/Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
snu-mllab/EDAC
Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)
chujiezheng/LLM-Extrapolation
Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"
shenao-zhang/SELM
The official implementation of Self-Exploring Language Models (SELM)
hamishivi/EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
haozheji/exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
RUCAIBox/JiuZhang3.0
The code and data for the paper JiuZhang3.0
thu-ml/Noise-Contrastive-Alignment
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
VinAIResearch/RecGPT
RecGPT: Generative Pre-training for Text-based Recommendation (ACL 2024)
wzhouad/WPO
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
thanhnguyentang/mmdrl
Official repo for our AAAI'21 paper, https://arxiv.org/abs/2007.12354
ZhaolinGao/REBEL
ars22/scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
TsinghuaC3I/Intuitive-Fine-Tuning
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
alecwangcq/f-divergence-dpo
Direct preference optimization with f-divergences.
morganf33/GNR
code for "Generative News Recommendation"