uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

PythonApache-2.0

Issues

Added a new PR to allow generation on fewer than 8 GPUs
#25 opened 13 days ago by aman2304
0
Typo here
#23 opened 21 days ago by xukp20
1
Scores and probability calcuations
#15 opened 2 months ago by namdw
3
Dataset used and results in Gemma-2-9B results
#12 opened 2 months ago by hodachi-axcxept
13
DPO baseline implementation
#22 opened 2 months ago by yesiam-png
0
Is it normal the pipeline start with a huge loss ?
#8 opened 2 months ago by qy1026
3
SPPO Implementation on Axolotl!
#21 opened 2 months ago by kaykyr
0
Adaptation for 4-bit Quantization Training/Responses Generation (with 2 Home GPUs)
#16 opened 2 months ago by kaykyr
1
What's the package configuration for reproduce SPPO-Gemma-2?
#14 opened 2 months ago by Jackory
1
Any chance it work on my homelab?
#13 opened 2 months ago by kaykyr
3
Suggestion: Gemma 2 9B and 27B.
#3 opened 2 months ago by kaykyr
2
Which version of vllm should be installed
#5 opened 2 months ago by xinghuang2050
4
Ranking speed & training hyperparameters
#10 opened 2 months ago by skramer-dev
0
Some packages' version are too old
#7 opened 2 months ago by qy1026
0
Questions about the training code
#6 opened 2 months ago by blackblue9
1
ShareGPT appending
#4 opened 2 months ago by Kquant03
0
Is it possible to run llama 3-70B and/or mixtral 8x22b through this process?
#1 opened 3 months ago by RandomInternetPreson
1
ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)
#2 opened 3 months ago by xinghuang2050
2