eric-mitchell/direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

PythonApache-2.0

Issues

Can we remove reference model during training?
#37 opened a year ago
2
Understanding loss
#36 opened a year ago
6
question about degeneration problem
#35 opened a year ago
5
question about dpo rewards, why does the reward not start from 0?
#34 opened a year ago
3
Multi-node optimizer save error
#33 opened a year ago
6
why do prob_eval(train)/chosen and rewards_eval(train)/chosen gradually decrease?
#32 opened a year ago
6
Request for PPO code
#31 opened a year ago
2
RuntimeError: Error(s) in loading state_dict on a custom model
#30 opened a year ago
11
Can we compute reference_chosen_logps, reference_rejected_logps offline?
#29 opened a year ago
3
IMDB experiments preference pair generation
#28 opened a year ago
1
IMDB Dataset Experiments
#27 opened a year ago
4
DPO Loss not converging for Encoder-Decoder Models
#26 opened a year ago
6
Request to Cite Our ICML2023 Poster Paper in DPO
#25 opened a year ago
2
Bradley-Terry (BT) model
#24 opened a year ago
1
cat `chosen_input_ids` and `rejected_input_ids`
#23 opened a year ago
3
Reject answer any questions after training on hh dataset
#21 opened a year ago
3
Loss is not converging during training DPO
#20 opened a year ago
4
Running into CUDA out of Memory: DPO Pipeline for Custom Llama Model (FSDP Trainer)
#19 opened a year ago
4
Excuse me, is multi-node traning almost ready？
#18 opened a year ago
1
Loss not converging for Cerebras-GPT-111M with HH dataset
#15 opened a year ago
5
error when training dpo on multiple cards using FSDPTrainer
#14 opened a year ago
6
Inference code example?
#13 opened a year ago
2
What exactly is "concatenated_inputs" doing?
#12 opened a year ago
1
`Killed` if I use FSDPTrainer or TensorParallelTrainer
#11 opened a year ago
6
What is slice_and_move_batch_for_device() supposed to do?
#10 opened a year ago
2
UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
#9 opened a year ago
9
question about chosen rewards and rejected rewards?
#8 opened a year ago
1
How might I use DPO for my custom model?
#7 opened a year ago
3
Is there a plan to support multi-node traning?
#5 opened a year ago
6
The role of the hyper param Beta
#4 opened a year ago
6
Convert PyTorch Model to Hugging Face model
#3 opened 2 years ago
1
Can it work with any HF GPT like model?
#2 opened 2 years ago
1
About apply Qlora for DPO training
#1 opened a year ago
9