l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

PythonMIT

Readme
16Issues
109Stargazers
3Watchers

Stargazers

Prev
Next

Contact site admin: Geeks.