rlaif

There are 9 repositories under rlaif topic.

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language:Python1.8k 17 451147
mengdi-li/awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
148 6 14
holarissun/Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
Language:Python35 3 45
vicgalle/zero-shot-reward-models
ZYN: Zero-Shot Reward Models with Yes-No Questions
Language:Python33 2 18
CIntellifusion/VideoDPO
Official Implementation of VideoDPO
Language:Python270
dannylee1020/openpo
Framework for synthetic data generation with AI feedback
Language:Python26 3 00
zhaochen0110/Timo
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
Language:Python19 1 11
vicgalle/distilled-self-critique
distilled Self-Critique refines the outputs of a LLM with only synthetic data
Language:Jupyter Notebook11 2 00
vicgalle/awesome-rlaif
A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)
10 2 00