mrahtz/learning-from-human-preferences
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
PythonMIT
Issues
- 0
Custom Environment
#17 opened by nil123532 - 1
Using Reward Predictor
#16 opened by eunjuyummy - 3
- 3
GRPC error
#15 opened by errorer-max - 0
GRPC error
#14 opened by errorer-max - 18
Extra instructions for Ubuntu
#4 opened by eggsyntax - 2
Does not run on Windows
#11 opened by jgocm - 4
Adjusting softmax function
#8 opened by jakkarn - 1
Doubt on normalizing rewards
#5 opened by SestoAle - 4