omarmnfy/Finetune-Llama3-using-Direct-Preference-Optimization
This repository contains Jupyter Notebooks, scripts, and datasets used in our finetuning experiments. The project focuses on Direct Preference Optimization (DPO), a method that simplifies the traditional finetuning process by using the model itself as a feedback mechanism.
Jupyter NotebookApache-2.0