/Simple-Random-Search

A simple random searching technique which provides a competitive approach to Reinforcement learning for Locomotion related tasks on Mu-Jo-Co bodies like Humanoid, Half-Cheetah etc

Primary LanguagePythonMIT LicenseMIT

Build status Build Status License MIT made with &hearts in Python

Augmented Random Search using Numpy

The project aims on building a new type of Artificial intelligence algorithm which is simple and surpasses many already available algorithms for Humanoid or Mu-Jo-Co(Multidimensionla-Joint-with-contact) locomotion related tasks. It simulates a powerful AI Algorithm,called Augmented Random Search (ARS) by training a Half-cheetah (Mu-Jo-Co) to walk and run across a field. to walk and run .

Motivation

Link to the Google-DeepMind's Video

Existing methods

  • Asynchronous Actor-Critic Agents
  • Deep Learning
  • Deep Reinforcement Learning

How is it different

  • Unlike other AI systems where the exploration occurs after each action (Action Space) , here exploration occurs after end of each episode (Policy space)
  • ARS is a shallow learning technique unlike deep learning in other AI's systems (Uses only one perceptron rather than layers of it)
  • ARS discards the technique of Gradient Descent for weight adjustment and uses the Method of Finite Differences

Implementation

Components

  • Perceptrons
  • Reward Mechanism and updation of weights
  • Method of finite Differences to find the best possible direction of movement

Algorithm

  • Scaling the update step by standard deviation of Rewards.
  • Online normalization of weights.
  • Choosing better directions for faster learning.
  • Discarding directions that yield lowest rewards.

Algorithm Overview

Alt text

Installation

  • Fork and clone the repository using git clone https://github.com/ashutoshtiwari13/Simple-Random-Search.git
  • Run pip install -r requirements.txt
  • Also check the Simulation.txt for setting up the PyBullet Simulation Environment
  • Use the Anaconda Cloud - Spyder IDE (Any framework/IDE of your choice)
  • Use Python 3.6 and above
  • Run the command python ars.py

Results

Reference Mu-ju-Co

Alt text

Series of Rewards

Rewards start from being negative as low as -900 and climbs to positive 900 in around 1000 steps. Alt Text Alt Text Alt Text

Simulation Images

Alt text

Further reading

  • Ben Recht's Blog
  • Reference paper - Link
  • Research paper used - Link

Happy coding 😊 ❤️ ✔️