/RLHF

Primary LanguageJupyter Notebook

Reinforcement Learning Through Human Feedbacks (RLHF)

RLHF pipeline:

  • STEP1: Ziegler2020
  • STEP2: HF tlr
  • STEP3: tlrx

Problems

  • Problem1: Traing GPT2 with PPO and reward model
  • Problem2: MathGPT

AI Cloud