eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
PythonApache-2.0
Issues
- 1
- 0
- 2
- 0
GPT4 prompt when evaluating DPO
#88 opened - 0
- 1
- 0
How are evals done on trained models?
#83 opened - 3
Hi @eric-mitchell ,
#82 opened - 1
where is config document of ipo?
#81 opened - 0
- 1
- 2
- 0
- 0
Can DPO work on BERT-style Model?
#75 opened - 0
- 3
Computing faster lopgs
#72 opened - 1
- 0
- 0
My Code to Reproduce IMDB
#69 opened - 1
- 2
- 3
Unable to Run SFT
#66 opened - 1
Bug in loading Llama tokenizer?
#65 opened - 1
Question bout IPO loss vs DPO loss
#64 opened - 2
- 1
Reproducing Win Rate inference for TL;DR
#62 opened - 2
- 1
- 1
- 2
Question about fine tuning steps(epoch)
#58 opened - 3
- 2
- 1
Training cost: RLHF vs DPO
#55 opened - 0
- 0
- 5
- 2
- 2
Pythia2.8B model weights
#50 opened - 2
How to load trained model for inference?
#49 opened - 9
Question about average_log_prob
#48 opened - 1
- 3
- 3
- 0
- 1
- 15
llama7B issue
#42 opened - 0
Strange loss pattern
#41 opened - 5
Question about average_log_prob
#40 opened - 0
- 1
does reference free mode work?
#38 opened