princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
PythonMIT
Issues
- 2
How to run the evaluation on GPT-4 Turbo
#49 opened by Xalp - 1
- 4
Difference with changing the gradient accumulation - ZeroEval and AlpacaEval 2
#61 opened by sahsaeedi - 1
- 2
On-Policy Preference Data Generation
#62 opened by tannedbum - 3
Evaluate about arena-hard.
#57 opened by hitszxs - 6
reward/chosen is decreasing
#42 opened by zhangguoxin1 - 6
Lora about simpo.
#39 opened by hitszxs - 8
Unable to reproduce the results of SFT
#27 opened by yujiaw98 - 2
Where is length normalization in the code?
#59 opened by ElegantLin - 2
- 3
- 2
ArgumentError: `attn_implementation`
#58 opened by ElegantLin - 1
What version of Mistral 7B was used for princeton-nlp/Mistral-7B-Instruct-SimPO ?
#56 opened by junkangwu - 10
- 0
bug using accelerate
#54 opened by cjakfskvnad - 3
Yi-34B + Simpo + Full + Novel writing Task, fine-tuning results are unsatisfactory
#33 opened by onlyfish79 - 1
Why is the AlpacaEval2 score of meta-llama/Meta-LLama-3-8B-Instruct in the paper higher than that on the leaderboard?
#31 opened by eugene-yh - 2
DPOTrainer.get_batch_logps() got an unexpected keyword argument 'average_log_prob'
#29 opened by RAY2L - 1
- 7
- 3
For the Instruct setup, why do different models require different training datasets? Can the same dataset be used?
#19 opened by qiuwenbogdut - 6
the outputs of reproduced model has "<|start_header_id|>assistant<|end_header_id|>" at beginning
#17 opened by binzhwang - 1
'loss': 0.0, 'grad_norm': nan, and get
#35 opened by wujia11 - 1
Hi, could you please provide the exact versions of the packages needed to reproduce the results and your released checkpoint?
#40 opened by AGTSAAA - 1
Use a better `ultrafeedback`
#43 opened by AIR-hl - 2
Questions about recent changes
#51 opened by junkangwu - 1
Model Request: Meta-Llama-3.1-8B-Instruct
#52 opened by Bearsaerker - 1
No template of gemma2 when evaluating it on AE2
#53 opened by sunjie279 - 1
Could you offer the packages' version of alpaca_eval 2 and the generated answers?
#44 opened by sunjie279 - 1
Cannot install alignment-handbook==0.4.0
#48 opened by Shentao-YANG - 2
Any technique for hyperparameter tuning?
#50 opened by NitinAB1108 - 3
Could you add licenses to the preference datasets after reward model labeling on huggingface?
#46 opened by hanyang1999 - 0
- 3
How to use local dataset
#41 opened by mazhengyufreedom - 2
About label_smoothing
#36 opened by mazhengyufreedom - 2
AttributeError: 'SimPOConfig' object has no attribute 'ref_model_init_kwargs'. Did you mean 'model_init_kwargs'?
#34 opened by Saumajit - 2
Why do the same methods have different results?
#37 opened by XiepengLi - 4
Unable to reproduce the results of DPO
#23 opened by AGTSAAA - 2
Question about apply_chat_template to prompt
#30 opened by EganGu - 2
- 1
Usage on Custom Dataset
#28 opened by ViperVille007 - 1
QLORA -4bit
#14 opened by seroetr - 1
Question about the length-normalization
#13 opened by yujiaw98 - 3
simloss has no length normalization in simpo loss
#32 opened by hjc3613 - 2
Change of Mistral Chat template
#26 opened by mianzhang - 2
TRL version should be 0.8.6
#25 opened by blakechi - 2
Length normalization in DPO and other variants
#20 opened by yakazimir - 2
ValueError: Unknown split "train". Should be one of ['train_iteration_1', 'test_iteration_1', 'train_iteration_2', 'test_iteration_2',
#18 opened by qiuwenbogdut - 2
Confusing Code logic
#15 opened by syboomsy