This is a re-implementation of the PILCO algorithm (originally written in MATLAB) in Python using Tensorflow and GPflow. This work was mainly carried out for personal development and some of the implementation is based on this Python implementation. This repository will mainly serve as a baseline for my future research.
I implemented the cart pole benchmark using MuJoCo and OpenAI. I did this because OpenAI's CartPole environment does not have a continuous action space and because the InvertedPendulum-v2 environment uses an "inverted" cart pole. The new environment represents the traditional cart pole benchmark with a continuous action space.
The env/cart_pole_env.py file contains the new CartPole class, based on InvertedPendulum-v2. I also created the env/cart_pole.xml file defining the MuJoCo environment for the traditional cart pole.
The example requires the MuJoCo (Multi-Joint dynamics with Contact) physics engine in order to use OpenAI's Inverted Pendulum simulation environment. I believe free student licence's are available.
Install the requirements using pip install -r requirements
.
- Make sure you use Python 3.
- You may want to use a virtual environment for this.
An example of implementing the code is given for the cart pole environment and can be found in examples/cart_pole.py.
- Aidan Scannell
This project is licensed under the MIT License - see the LICENSE.md file for details.
-
The original implementation of PILCO:
-
- M. P. Deisenroth, D. Fox, and C. E. Rasmussen
- Gaussian Processes for Data-Efficient Learning in Robotics and Control
- IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014
-
- M. P. Deisenroth and C. E. Rasmussen
- PILCO: A Model-based and Data-Efficient Approach to Policy Search
- International Conference on Machine Learning (ICML), 2011
-
-
I took inspiration and some code from this Python implementation.