A (somewhat) minimal library for finetuning language models with PPO on human feedback.
Primary LanguagePython