Vanilla Policy Gradient with Lunar Lander
Results
Agent before & after training:
Training is very noisy:
Todo
- batch episodes for training (currently updating policy after each episode)
- add baseline and other variance reducing techniques
- try different algos altogether