Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper
rustic-snob opened this issue · 1 comments
Hi,
I am Jaewon Cheon, currently delving into NLP and LLM studies in South Korea. I would like to begin by extending my heartfelt appreciation for your pioneering work on DPO. It has profoundly impacted the academic sphere and offered an efficient method for many practitioners to locally tune LLMs to their preferences, circumventing the often arduous task of RL training and its complex prerequisites.
During my thorough examination of your paper, I believe I have stumbled upon a potential oversight concerning the notation in the gradient definition.
In Appendix A.4, specifically Eq. 21, there seems to be an inversion in the order of the terms
Although this may be a minor detail, I thought it prudent to raise it to your attention for clarification, ensuring the accuracy and clarity of the paper's methodology.
Thank you once again for your remarkable contribution to the field.
Kind regards,
Jaewon Cheon
Thanks for pointing this out, Jaewon! I believe we had this reversal in the main text of the paper in an earlier version, and forgot to also fix it when we fixed it in the main text. We'll update for the next revision of the paper.