You need to install swig
and ffmpeg
. Then run pip install -r requirements.txt
.
python train_ppo.py
. Args :
-- type
: eitherexp
if you want to run a wandb sweep experiment, ornormal
to run with a specific configuration--config
: specify the path of the config file for ppo--env
: eitherlunarlander
orcartpole
(gym environments)--human_preferences
: eithertrue
orfalse
if you want to integrate human preferences--reward_ckpt
: specify the reward model checkpoint file if--human_preferences
istrue
python train_reinforce.py
. Args :
-- type
: eitherexp
if you want to run a wandb sweep experiment, ornormal
to run with a specific configuration--config
: specify the path of the config file for ppo--env
: eitherlunarlander
orcartpole
(gym environments)
python train_reward.py
.
- PPO Paper: https://arxiv.org/abs/1707.06347
- Intro RL: https://lilianweng.github.io/posts/2018-02-19-rl-overview/
- Policy gradients algorithms (with PPO): https://lilianweng.github.io/posts/2018-04-08-policy-gradient/