Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
They really took the title of the DPO paper to heart.
@misc{yuan2024selfrewarding,
title = {Self-Rewarding Language Models},
author = {Weizhe Yuan and Richard Yuanzhe Pang and Kyunghyun Cho and Sainbayar Sukhbaatar and Jing Xu and Jason Weston},
year = {2024},
eprint = {2401.10020},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}