/RL-Balancer

Balanacer balances a pole on top of a cart by moving the cart either left or right.

Primary LanguagePythonMIT LicenseMIT

Reinforcement Learning Balancer

Uses REINFORCE policy gradients to perform gradient ascent.
Rewarding details: for every time step that the algorithm has not failed (the pole has neither fallen nor will imminently fall) the algorithm is given a reward of +1.

Requirements:

TensorFlow
NumPy
OpenAI Gym

python -m pip install --upgrade tensorflow numpy gym 

Usage

python main.py --flag1 [value] --flag2 [value] (...)

References:

Williams, R.J. Mach Learn (1992) 8: 229. https://doi.org/10.1007/BF00992696
Peters, Jan. “Policy Gradient Methods.” Scholarpedia, Jan Peters, 2010, www.scholarpedia.org/article/Policy_gradient_methods. Boilerplate and framework borrowed from Chapter 16 of Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron.