cartpolebalancing

Using Policy Iteration to solve the Cart Pole Balancing problem. A simple 1 hidden layer fully connected neural network is used to evaluate the best action for a given state. Suppose a training episode lasts for k steps. Reward for each step is collected, and discounted return is calculated for each step after the episode ends. (state,discounted return) is stored for each each episode. Backpropogration is done for a batch of episodes, and the process is repeated for a number of batches.

Here's a GIF of the trained AI:

Simulation environment: OpenAI Gym Cartpole-v0

Forward pass and backpropogation done in Theano. Here are good tutorial for getting started with Theano and for implementing a simple ANN.

I used the CPU for this. The Nvidia drivers are a bit tricky to install on Ubuntu 1604 if you have Intel's Skylake. Here's my Theano .theanorc config for CPU:

[global]
floatX = float32
device = cpu
force_device=True
pycuda.init = False

[lib]
cnmem = 1

[blas]
ldflags=-L/usr/lib/ -lblas

litesaber15/cartpolebalancing

cartpolebalancing