princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

PythonMIT

Issues

How to run the evaluation on GPT-4 Turbo
#49 opened 6 months ago by Xalp
2
looking for the model parameters of Llama3-Instruct (8B) SFT
#63 opened 4 months ago by alphatogo
1
Difference with changing the gradient accumulation - ZeroEval and AlpacaEval 2
#61 opened 5 months ago by sahsaeedi
4
Question about the `annotators_config` and `reference_outputs` in alpaca_eval
#55 opened 5 months ago by AIR-hl
1
On-Policy Preference Data Generation
#62 opened 5 months ago by tannedbum
2
Evaluate about arena-hard.
#57 opened 5 months ago by hitszxs
3
reward/chosen is decreasing
#42 opened 6 months ago by zhangguoxin1
6
Lora about simpo.
#39 opened 6 months ago by hitszxs
6
Unable to reproduce the results of SFT
#27 opened 6 months ago by yujiaw98
8
Where is length normalization in the code?
#59 opened 5 months ago by ElegantLin
2
Cannot reproduce the training curves of LLaMA3-simpo-v2
#60 opened 5 months ago by jdf-prog
2
Can you share the loss log of `Llama-3-8B-Instruct`
#45 opened 6 months ago by AIR-hl
3
ArgumentError: `attn_implementation`
#58 opened 5 months ago by ElegantLin
2
What version of Mistral 7B was used for princeton-nlp/Mistral-7B-Instruct-SimPO ?
#56 opened 5 months ago by junkangwu
1
can't reproduce the results of Mistral-7B-Instruct-DPO
#38 opened 6 months ago by RikkiXu
10
bug using accelerate
#54 opened 6 months ago by cjakfskvnad
0
Yi-34B + Simpo + Full + Novel writing Task, fine-tuning results are unsatisfactory
#33 opened 6 months ago by onlyfish79
3
Why is the AlpacaEval2 score of meta-llama/Meta-LLama-3-8B-Instruct in the paper higher than that on the leaderboard?
#31 opened 6 months ago by eugene-yh
1
DPOTrainer.get_batch_logps() got an unexpected keyword argument 'average_log_prob'
#29 opened 6 months ago by RAY2L
2
Unable to reproduce the mt bench results in the paper
#22 opened 6 months ago by lmx760581375
1
Has anyone reproduced the results in the paper quantitatively?
#21 opened 6 months ago by binzhwang
7
For the Instruct setup, why do different models require different training datasets? Can the same dataset be used?
#19 opened 6 months ago by qiuwenbogdut
3
the outputs of reproduced model has "<|start_header_id|>assistant<|end_header_id|>" at beginning
#17 opened 6 months ago by binzhwang
6
'loss': 0.0, 'grad_norm': nan, and get
#35 opened 6 months ago by wujia11
1
Hi, could you please provide the exact versions of the packages needed to reproduce the results and your released checkpoint?
#40 opened 6 months ago by AGTSAAA
1
Use a better `ultrafeedback`
#43 opened 6 months ago by AIR-hl
1
Questions about recent changes
#51 opened 6 months ago by junkangwu
2
Model Request: Meta-Llama-3.1-8B-Instruct
#52 opened 6 months ago by Bearsaerker
1
No template of gemma2 when evaluating it on AE2
#53 opened 6 months ago by sunjie279
1
Could you offer the packages' version of alpaca_eval 2 and the generated answers?
#44 opened 6 months ago by sunjie279
1
Cannot install alignment-handbook==0.4.0
#48 opened 6 months ago by Shentao-YANG
1
Any technique for hyperparameter tuning?
#50 opened 6 months ago by NitinAB1108
2
Could you add licenses to the preference datasets after reward model labeling on huggingface?
#46 opened 6 months ago by hanyang1999
3
Experimental results on ARC-C subset for challeging reasoning?
#47 opened 6 months ago by tongyx361
0
How to use local dataset
#41 opened 6 months ago by mazhengyufreedom
3
About label_smoothing
#36 opened 6 months ago by mazhengyufreedom
2
AttributeError: 'SimPOConfig' object has no attribute 'ref_model_init_kwargs'. Did you mean 'model_init_kwargs'?
#34 opened 6 months ago by Saumajit
2
Why do the same methods have different results?
#37 opened 6 months ago by XiepengLi
2
Unable to reproduce the results of DPO
#23 opened 6 months ago by AGTSAAA
4
Question about apply_chat_template to prompt
#30 opened 6 months ago by EganGu
2
Your comparsion is unfair by using different chat templates
#24 opened 6 months ago by AGTSAAA
2
Usage on Custom Dataset
#28 opened 6 months ago by ViperVille007
1
QLORA -4bit
#14 opened 6 months ago by seroetr
1
Question about the length-normalization
#13 opened 6 months ago by yujiaw98
1
simloss has no length normalization in simpo loss
#32 opened 6 months ago by hjc3613
3
Change of Mistral Chat template
#26 opened 7 months ago by mianzhang
2
TRL version should be 0.8.6
#25 opened 7 months ago by blakechi
2
Length normalization in DPO and other variants
#20 opened 7 months ago by yakazimir
2
ValueError: Unknown split "train". Should be one of ['train_iteration_1', 'test_iteration_1', 'train_iteration_2', 'test_iteration_2',
#18 opened 7 months ago by qiuwenbogdut
2
Confusing Code logic
#15 opened 7 months ago by syboomsy
2