thomfoster/minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
Python
Issues
- 0
About Advantage Normalization
#5 opened by 1140310118 - 1
Question about reward augmentation
#4 opened by zerlinwang - 0
About jax code
#3 opened by sglucas - 0
- 0