GeneticAlgo_OpenAIGymCartPole

Genetic Algorithm is so powerful that model learns very fast. Most of the cases it performs perfectly during testing may be due to search space is not that large compared to other environments. If we do not end environment it will hold the pole for forever. I run environment up to 10000-time steps :D .

Complete version

Here I will explain about my implementation. Let’s start with a brief overview. From cartpole environment we can get observations, awards for each action input (0/1) given to the environment. These observations feed into an artificial neural network which decides what should be the action for next step as shown in below diagram.

We have four observations and one action so neural network consists four input one out. Initially, weights and biases are randomly selected but tuning is required to get perfect action for each time step. Here genetic algorithm plays an important role for the optimal solution of weights and biases. This algorithm works on the survival of the fittest principle so generation(one set of weights and biases) with thebest performance will survive. Above diagram simplified below showing different weight-bias sets associated with different nodes.

Initially, this weights and biases randomly selected then for each selection the cartpole environment run and scores stored until game over. Now sets with top scores selected and arranged in a particular sequence similar to DNA for crossover (in our case swapping two sequences random) to generate next sequence. The image below shows that how DNAs with red and gray dots exchange some porting with each other to generate two new DNAs.

Exchange point can be anywhere as it is chosen randomly. After crossover mutation is done which is also a random update of few of the dots as a brown dot and a red dot become blue and black respectively after mutation. These new generated along with parent DNAs(weights-biases) put inside the neural network and again see the scores from each and then loop-over the evolution process of genetic algorithm again until perfect score achieved.

Results from Three Layer Neural Network

Results from Two Layer Neural Network

MaveriQ/GeneticAlgo_OpenAIGymCartPole

GeneticAlgo_OpenAIGymCartPole