eric-mitchell/direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

PythonApache-2.0

Issues

When trying to reproduce the complete example, "NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet" is thrown
#91 opened 2 months ago
1
ValueError when using peft on FSDPTrainer
#90 opened 2 months ago
0
In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’
#89 opened 3 months ago
2
GPT4 prompt when evaluating DPO
#88 opened 4 months ago
0
How to gurantee the output.logits.shape[:-1] == labels.shape
#87 opened 4 months ago
0
Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer
#85 opened 5 months ago
1
How are evals done on trained models?
#83 opened 7 months ago
0
Hi @eric-mitchell ,
#82 opened 4 months ago
3
where is config document of ipo?
#81 opened 8 months ago
1
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks
#80 opened 8 months ago
0
Division by Zero error sporadically occurs
#78 opened 8 months ago
1
Weird logits and model starts degeneration while training DPO
#77 opened 9 months ago
2
Was it your intention to recreate wandb tables in iterator?
#76 opened 9 months ago
0
Can DPO work on BERT-style Model?
#75 opened 9 months ago
0
The number of training steps in the SHP dataset
#73 opened 9 months ago
0
Computing faster lopgs
#72 opened 10 months ago
3
Implementation for Plackett-Luce rank model
#71 opened 10 months ago
1
What's the reference policy of Preferred-FT in Figure 2?
#70 opened 10 months ago
0
My Code to Reproduce IMDB
#69 opened 10 months ago
0
Why does SFT sum the cross-entropy loss within each sequence?
#68 opened 10 months ago
1
Using cross entropy loss to calculate DPO?
#67 opened 10 months ago
2
Unable to Run SFT
#66 opened 10 months ago
3
Bug in loading Llama tokenizer?
#65 opened a year ago
1
Question bout IPO loss vs DPO loss
#64 opened a year ago
1
Appendix A.4 of the papper: the derived gradient is not consistent with the main text
#63 opened a year ago
2
Reproducing Win Rate inference for TL;DR
#62 opened a year ago
1
Question regarding the logits in the `_get_batch_logps` function
#61 opened a year ago
2
Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper
#60 opened a year ago
1
Unable to run the code for Step 2: Run SFT
#59 opened a year ago
1
Question about fine tuning steps(epoch)
#58 opened a year ago
2
Question about _get_batch_logps of trainers.py
#57 opened a year ago
3
DPO did not achieve the expected experimental effect
#56 opened a year ago
2
Training cost: RLHF vs DPO
#55 opened a year ago
1
How to re-implement the result of IMDB sentiment generation.
#54 opened a year ago
0
Llama-2-13b-chat Valid reward accuracy remains ~50%
#53 opened a year ago
0
Qwen model issues & embedding and loss has nan
#52 opened a year ago
5
error when following the readme to train sft on multiple cards using FSDPTrainer
#51 opened a year ago
2
Pythia2.8B model weights
#50 opened a year ago
2
How to load trained model for inference?
#49 opened a year ago
2
Question about average_log_prob
#48 opened a year ago
9
No such file or directory: json-train-00000-00000-of-NNNNN.arrow
#47 opened a year ago
1
Loss is 0 when policy and reference models are the same
#46 opened a year ago
3
Questions about the IMDB Sentiment dataset
#45 opened a year ago
3
Questions about the average_log_prob parameter
#44 opened a year ago
0
Is fine tuning with e.g., LORA supported?
#43 opened a year ago
1
llama7B issue
#42 opened a year ago
15
Strange loss pattern
#41 opened a year ago
0
Question about average_log_prob
#40 opened a year ago
5
Why sometimes chosen_rewards become negaive?
#39 opened a year ago
0
does reference free mode work?
#38 opened a year ago
1