clam004/minichatgpt

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

Jupyter Notebook

Stargazers

alirzaMhd
Alloooshe
St. Petersburg, Russia
archwolf118
bdx0
bdx0.io.vn
chenlulouis
dannielum
New York
EnernityTwinkle
happyPydog
Taipei
juntengzhang
laurencecwj
meet-cjli
MrCsabaToth
Csaba Consulting
Near32
nschlemm
coder nostra GmbH
peter0083
United States
Rajae-Robinson
Kingston, Jamaica
SatchalPatil
xinagand