Aligning with DPO a Gemma 2B model.

Question

Aligning with DPO a Gemma 2B model.

Closed this issue 2 months ago · 5 comments

Description of the feature request:

I want to create a new notebook for the cookbook showing how to align a Gemma model using DPO with a public dataset from Hugging Face.

I'm going to use this notebook as a base:
Aligning_DPO_open_gemma-2b-it.ipynb

but with more explanations and adapting the format to meet the requirements of the cookbook.

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

Answer 1 · 2024-08-09T02:00:23.000Z

This looks interesting. Could you

follow the steps here?
come up with another inference demo, since the current one gives the wrong answer mathematically, although it follows the instruction?

Answer 2 · 2024-08-09T07:19:38.000Z

Thanks @windmaple.

The model adapts its behavior to the alignment process. In the response from the unaligned model, the instruction to return only numbers is ignored. I will adapt the inference example to an operation that returns the correct value or look for another example.

On the other hand, I'm not sure whether to keep the code that publishes the model on Hugging Face as an example, or if it would be better to remove it and focus the notebook solely on the DPO process.

Answer 3 · 2024-08-10T04:30:18.000Z

On the other hand, I'm not sure whether to keep the code that publishes the model on Hugging Face as an example, or if it would be better to remove it and focus the notebook solely on the DPO process.

Yes, pls keep this part of the code. We want to make it as easy as possible to publish any finetuned model.

Answer 4 · 2024-08-15T07:30:39.000Z

Thank you for your contribution👍👍! We will try to get it featured on Google's social account soon.

Answer 5 · 2024-08-15T15:33:18.000Z

Thanks to you! A pleasure!