clam004/minichatgpt
annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation
Jupyter Notebook
annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation
Jupyter Notebook