In this repository I will try different algorithms and play with them.
I have been playing with Stable_Baselines3 and the Lunar_Lander_v2 environment.
Obtained an average reward of 270, training for 2e6 timesteps with the PPO algorithm.
In this repository I will try different algorithms and play with them.
Jupyter NotebookMIT