thomfoster/minRLHF

A (somewhat) minimal library for finetuning language models with PPO on human feedback.

Python

Issues

About Advantage Normalization
#5 opened 2 years ago by 1140310118
0
Question about reward augmentation
#4 opened 2 years ago by zerlinwang
1
About jax code
#3 opened 2 years ago by sglucas
0
Pipenv install and missing `torch-discounted-cumsum` module
#2 opened 2 years ago by lucascassiano
0
How to save a trained model with `ppo_trainer`?
#1 opened 2 years ago by lucascassiano
0