/phi-2-sft-and-dpo

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

Primary LanguageJupyter Notebook

Watchers