l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
PythonMIT
Issues
- 8
deepspeed.initialize的一些疑惑
#8 opened by iamsile - 0
增大max_prompt_len和max_ans_len训练会出现非法的内存访问问题
#16 opened by Luoxiaohei41 - 0
训练问题
#15 opened by wanghao-007 - 0
Step 3: Actor model和Reward model使用不同的tokenizer
#14 opened by Kevin-myxu - 1
step2和step3中padding side似乎不一样?
#13 opened by qiancheng99 - 1
A question about setting tokens
#12 opened by hepj987 - 5
- 12
v100 step3 oom
#6 opened by iamsile - 1
关于Step3中是否需要把生成的answer中eos后面token mask掉
#9 opened by Ablustrund - 2
Fix pad_token_id bug
#10 opened by Ablustrund - 2
how to run it, need more details
#7 opened by SeekPoint - 4
stop at step2 evaluation_reward
#5 opened by murphypei - 3
训练效果怎么样
#1 opened by Curious-chen - 2
reward model在v100上训练时会卡住不动
#4 opened by iamsile - 1
- 2
v100训练时显存oom
#3 opened by iamsile