/Finetune-Llama3-using-Direct-Preference-Optimization

This repository contains Jupyter Notebooks, scripts, and datasets used in our finetuning experiments. The project focuses on Direct Preference Optimization (DPO), a method that simplifies the traditional finetuning process by using the model itself as a feedback mechanism.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Stargazers