mrahtz/learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

PythonMIT

Issues

Custom Environment
#17 opened 4 months ago by nil123532
0
Using Reward Predictor
#16 opened 10 months ago by eunjuyummy
1
The output is always waiting for preferences, 0 so far.
#3 opened 6 years ago by ZhanPython
3
GRPC error
#15 opened 2 years ago by errorer-max
3
GRPC error
#14 opened 2 years ago by errorer-max
0
Extra instructions for Ubuntu
#4 opened 6 years ago by eggsyntax
18
Does not run on Windows
#11 opened 3 years ago by jgocm
2
Adjusting softmax function
#8 opened 4 years ago by jakkarn
4
Doubt on normalizing rewards
#5 opened 5 years ago by SestoAle
1
Synthetic preferences - no preferences received
#2 opened 7 years ago by JawwadF
4