/REINFORCE

REINFORCE - Policy Gradient Theorem Implementation in PyTorch, for 'LunarLander-v2' Gym Env

Primary LanguagePython

REINFORCE - Policy Gradient

  • On Policy Learning
  • Advantageous for Continuous State/Action spaces

image

Train

Params

  • Env: Choose a compatible env from gym to train the model, default='LunarLander-v2'
  • Learning Rate: Experiment with different learning rates to find the optimal one, default=0.0005
  • logdir: Visaulise the ongoing training results using tensorboard

Make checkpoint directories

mkdir -p ./agent/LunarLander-v2

Generate samples

python play.py --env 'LunarLander-v2' --logdir './plays/LunarLander-v2/lr=0.0005' --epochs 10000 --lr==0.0005 --chkpt './agent/LunarLander-v2/lr=0.0005.pt' && 
python play.py --env 'LunarLander-v2' --logdir './plays/LunarLander-v2/lr=0.001' --epochs 10000 --lr=0.001 --chkpt './agent/LunarLander-v2/lr=0.001' 

Analyze the training results using Tensorboard

tensorboard --logdir=./plays/LunarLander-v2/

Results

python play_testing.py --env "LunarLander-v2" --chkpt './agent/LunarLander-v2/lr=0.0005'
LunarLander-v2.mp4