- On Policy Learning
- Advantageous for Continuous State/Action spaces
Params
- Env: Choose a compatible env from gym to train the model, default='LunarLander-v2'
- Learning Rate: Experiment with different learning rates to find the optimal one, default=0.0005
- logdir: Visaulise the ongoing training results using tensorboard
Make checkpoint directories
mkdir -p ./agent/LunarLander-v2
Generate samples
python play.py --env 'LunarLander-v2' --logdir './plays/LunarLander-v2/lr=0.0005' --epochs 10000 --lr==0.0005 --chkpt './agent/LunarLander-v2/lr=0.0005.pt' &&
python play.py --env 'LunarLander-v2' --logdir './plays/LunarLander-v2/lr=0.001' --epochs 10000 --lr=0.001 --chkpt './agent/LunarLander-v2/lr=0.001'
Analyze the training results using Tensorboard
tensorboard --logdir=./plays/LunarLander-v2/
Results
python play_testing.py --env "LunarLander-v2" --chkpt './agent/LunarLander-v2/lr=0.0005'