wudemoai/Simple_LLM_PPO

Jupyter Notebook

PPO方法训练大语言模型,简易实现代码

环境信息:

python=3.10

torch==2.1.0(cuda)

transformers==4.34.0

datasets==2.14.5

trl==0.7.2

视频课程:https://www.bilibili.com/video/BV1uy4y1c7DV