A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Primary LanguagePythonMIT LicenseMIT