Pinned Repositories
Combination
Recogntion-Error-Classification
reid
SWSI
Word-Burst
alignment-handbook
Robust recipes to align language models with human and AI preferences
MOSS-RLHF
Secrets of RLHF in Large Language Models Part I: PPO
Robust recipes to align language models with human and AI preferences
Secrets of RLHF in Large Language Models Part I: PPO