Edward-Sun/easy-to-hard

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

PythonBSD-3-Clause

Issues

Question about REST-EM
#9 opened 2 months ago by mandyyyyii
1
Question about Paper: Is it possible to combine SFT and reward model?
#8 opened 3 months ago by Site1997
1
question about reward score
#7 opened 4 months ago by DecideToLeave
6
prm loss变化
#6 opened 4 months ago by DecideToLeave
0
SFT error
#5 opened 6 months ago by zxy-smart
0
Two questions about the article
#4 opened 6 months ago by xiaolizh1
4
About the training scripts.
#3 opened 6 months ago by Zjshadow
1
readme for data
#2 opened 6 months ago by rguo12
2
Question about the results of PRM and ORM
#1 opened 7 months ago by HillZhang1999
1