jay-karimi/vpg-lunarlander

Jupyter Notebook

Vanilla Policy Gradient with Lunar Lander

Results

Agent before & after training:

Training is very noisy:

Todo

batch episodes for training (currently updating policy after each episode)
add baseline and other variance reducing techniques
try different algos altogether