eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
PythonApache-2.0
Issues
- 2
- 6
Understanding loss
#36 opened - 5
question about degeneration problem
#35 opened - 3
- 6
Multi-node optimizer save error
#33 opened - 6
- 2
Request for PPO code
#31 opened - 11
- 3
- 1
- 4
IMDB Dataset Experiments
#27 opened - 6
- 2
- 1
Bradley-Terry (BT) model
#24 opened - 3
- 3
- 4
Loss is not converging during training DPO
#20 opened - 4
- 1
- 5
- 6
- 2
Inference code example?
#13 opened - 1
- 6
- 2
- 9
- 1
- 3
- 6
- 6
The role of the hyper param Beta
#4 opened - 1
- 1
- 9
About apply Qlora for DPO training
#1 opened