/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

Primary LanguagePythonMIT LicenseMIT

Issues