Notes and commented code for RLHF (PPO)
The code has been commented by using the trl library from Hugging Face with version 0.7.10: https://github.com/huggingface/trl/
You will find the original code of the ppo_trainer.py
file and also the commented code. You can use any diff tool to check my comments.
The video is here: https://www.youtube.com/watch?v=qGyFrqc34yc