lucidrains/self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
PythonMIT
Issues
- 0
usage demo is not working
#32 opened by 652994331 - 3
UnboundLocalError: local variable 'self_reward_model' referenced before assignment
#24 opened by UbeCc - 1
What's the reference model for DPO?
#31 opened by Draconda - 0
- 1
- 0
- 1
- 3
How to use HF Transformers model
#10 opened by fakerybakery - 1
- 2
- 0
Multiple GPUs
#14 opened by fakerybakery - 1
Why use a custom sample function instead of original HuggingFace generate() function?
#11 opened by scarydemon2 - 6
The reward prompt is weak.
#7 opened by Minami-su - 1
- 3
run spin demo
#8 opened by westlongtime - 4
Is this work in progress?
#4 opened by jbdatascience - 1
Help with Setting up and running ?
#3 opened by badboysm890 - 0
code and dataset?
#1 opened by wanghao-007