AntonioAlgaida/Playground

In this repository I will try different algorithms and play with them.

Jupyter NotebookMIT

Playground

In this repository I will try different algorithms and play with them.

Playground 0

I have been playing with Stable_Baselines3 and the Lunar_Lander_v2 environment.

Obtained an average reward of 270, training for 2e6 timesteps with the PPO algorithm.