rlaif
There are 9 repositories under rlaif topic.
argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
mengdi-li/awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
holarissun/Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
vicgalle/zero-shot-reward-models
ZYN: Zero-Shot Reward Models with Yes-No Questions
CIntellifusion/VideoDPO
Official Implementation of VideoDPO
dannylee1020/openpo
Framework for synthetic data generation with AI feedback
zhaochen0110/Timo
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
vicgalle/distilled-self-critique
distilled Self-Critique refines the outputs of a LLM with only synthetic data
vicgalle/awesome-rlaif
A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)