Issues
- 2
Question about RRHF
#5 opened by SihengLi99 - 1
QLORA -4bit
#14 opened by seroetr - 3
Unable to reproduce the results of DPO
#23 opened by AGTSAAA - 1
simloss has no length normalization in simpo loss
#32 opened by hjc3613 - 0
Yi-34B + Simpo + Full + Novel writing Task, fine-tuning results are unsatisfactory
#33 opened by onlyfish79 - 5
- 3
Unable to reproduce the results of SFT
#27 opened by yujiaw98 - 1
Why is the AlpacaEval2 score of meta-llama/Meta-LLama-3-8B-Instruct in the paper higher than that on the leaderboard?
#31 opened by eugene-yh - 1
Usage on Custom Dataset
#28 opened by ViperVille007 - 2
DPOTrainer.get_batch_logps() got an unexpected keyword argument 'average_log_prob'
#29 opened by RAY2L - 0
Question about apply_chat_template to prompt
#30 opened by EganGu - 2
Change of Mistral Chat template
#26 opened by mianzhang - 2
- 13
Training leads to model collapse
#12 opened by Ricardokevins - 2
TRL version should be 0.8.6
#25 opened by blakechi - 1
- 2
Length normalization in DPO and other variants
#20 opened by yakazimir - 3
For the Instruct setup, why do different models require different training datasets? Can the same dataset be used?
#19 opened by qiuwenbogdut - 2
ValueError: Unknown split "train". Should be one of ['train_iteration_1', 'test_iteration_1', 'train_iteration_2', 'test_iteration_2',
#18 opened by qiuwenbogdut - 6
the outputs of reproduced model has "<|start_header_id|>assistant<|end_header_id|>" at beginning
#17 opened by binzhwang - 2
accelerator.prepare() CUDA out of memory
#10 opened by zouce - 1
Question about the length-normalization
#13 opened by yujiaw98 - 2
Confusing Code logic
#15 opened by syboomsy - 4
On Length Normalization in the Code
#4 opened by jc-ryan - 7
Mismatch of results
#9 opened by AGTSAAA - 3
About the version of alignment-handbook
#6 opened by lucasliunju - 0
Length normalization
#7 opened by Michelleable - 1
Repeated Addition of Assistant Turn in Prompt/Chosen/Rejected Text Using `apply_chat_template`
#8 opened by iseesaw - 1
Integrate upstream in alignment-handbook
#2 opened by BramVanroy - 1
Upstream `SimPOTrainer` to TRL
#3 opened by philschmid