eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
PythonApache-2.0
Issues
- 1
When trying to reproduce the complete example, "NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet" is thrown
#91 opened by ZSvedic - 2
error when following the readme to train sft on multiple cards using FSDPTrainer
#51 opened by NekoMimiUnagi - 0
ValueError when using peft on FSDPTrainer
#90 opened by AragornHorse - 5
Qwen model issues & embedding and loss has nan
#52 opened by lylcst - 2
In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’
#89 opened by Alan-D-Chen - 15
llama7B issue
#42 opened by JiuhaiChen - 3
- 3
Hi @eric-mitchell ,
#82 opened by Gryff1ndor - 1
Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer
#85 opened by kygguo - 0
GPT4 prompt when evaluating DPO
#88 opened by kygguo - 0
- 1
- 3
Computing faster lopgs
#72 opened by alexvishnevskiy - 5
Question about average_log_prob
#40 opened by Kyeongpil - 0
How are evals done on trained models?
#83 opened by lesnikow - 2
- 3
Unable to Run SFT
#66 opened by Rui-Yuan91 - 3
Question about _get_batch_logps of trainers.py
#57 opened by wulaoshi - 1
where is config document of ipo?
#81 opened by 3244we - 0
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks
#80 opened by Jayant1234 - 1
Division by Zero error sporadically occurs
#78 opened by Jayant1234 - 2
Question about fine tuning steps(epoch)
#58 opened by gyuwon12 - 1
Implementation for Plackett-Luce rank model
#71 opened by rohan598 - 1
Reproducing Win Rate inference for TL;DR
#62 opened by jdchang1 - 2
Pythia2.8B model weights
#50 opened by alexv-cerebras - 0
- 1
Question bout IPO loss vs DPO loss
#64 opened by MoonBlvd - 2
Using cross entropy loss to calculate DPO?
#67 opened by zachares - 0
Can DPO work on BERT-style Model?
#75 opened by Leo-T-Zang - 0
The number of training steps in the SHP dataset
#73 opened by bonin147 - 9
Question about average_log_prob
#48 opened by LSX-Sneakerprogrammer - 0
- 0
My Code to Reproduce IMDB
#69 opened by QiyaoWei - 2
- 1
Bug in loading Llama tokenizer?
#65 opened by ajyl - 2
Appendix A.4 of the papper: the derived gradient is not consistent with the main text
#63 opened by yflyzhang - 1
does reference free mode work?
#38 opened by JosephZZ - 1
Training cost: RLHF vs DPO
#55 opened by kartheekmedathati - 1
Unable to run the code for Step 2: Run SFT
#59 opened by ppsmk388 - 1
Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper
#60 opened by rustic-snob - 2
- 3
Questions about the IMDB Sentiment dataset
#45 opened by stevie1023 - 2
How to load trained model for inference?
#49 opened by VibhuAg - 0
- 0
- 1
- 0
- 1
Is fine tuning with e.g., LORA supported?
#43 opened by Emerald01 - 0
Strange loss pattern
#41 opened by puyuanOT - 0