rlagents

Understanding Reinforcement Learning and Implementing RL agents for OpenAI Gym from Scratch

rlagents
Reinforcement Learning

About The Project

Reinforcement Learning

Aim:

Understand Reinforcement Learning
A simple solution to the Multi Armed Bandit Problem
To solve rl agents in OpenAI gym

Theory:

Refer our documentation for detailed analysis and brief overview of our project.

Reinforcement Learning

Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones.
In general, a reinforcement learning agent is able to perceive and interpret its environment, take actions and learn through trial and error. -Some notable examples of RL in particular Deep RL
In 2013, Atari game Breakout took around 36 hours of training with the DQN in order to achieve commendable results! Now we can achieve similar results in matter of hours.
The agents created in Dota2 were able to defeat pro players at their own game! And did really well in the 5v5 matchup!!
As you can see DeepMind by Google and OpenAI are two organisations with insane accomplishments in the field of Reinforcement Learning

Markov Decision Process

The learner and decision maker is called the agent.
The thing it interacts with, comprising everything outside the agent, is called the environment.
These interact continually, the agent selecting actions and the environment responding to these actions and presenting new situations to the agent.
The environment also gives rise to rewards, special numerical values that the agent seeks to maximize over time through its choice of actions.
Basically, If you have a problem you want to solve, if you can map it to an MDP, it means you can run a reinforcement algorithm on it

Multi armed Bandit Problem

The multi-armed bandit problem is a classic problem that well demonstrates the exploration vs exploitation dilemma. Imagine you are in a casino facing multiple slot machines and each is configured with an unknown probability of how likely you can get a reward at one play. The question is: What is the best strategy to achieve highest long-term rewards?
We are using Epsilon greedy Algorithm to solve this problem

Epsilon Greedy Algorithm

The Epsilon-Greedy algorithm balances exploitation and exploration fairly basically.
It takes a parameter, epsilon, between 0 and 1, as the probability of exploring the options (called arms in multi-armed bandit discussions) as opposed to exploiting the current best variant in the test.
For example, say epsilon is set at 0.1.
Every time a visitor comes to the website being tested, a number between 0 and 1 is randomly drawn. If that number is greater than 0.1, then that visitor will be shown whichever variant (at first, version A) is performing best.
If that random number is less than 0.1, then a random arm out of all available options will be chosen and provided to the visitor.
The visitor’s reaction will be recorded (a click or no click, a win or lose, etc.) and the success rate of that arm will be updated accordingly. Low values of epsilon correspond to less exploration and more exploitation, therefore - it takes the algorithm longer to discover which is the best arm but once found, it exploits it at a higher rate.

OpenAI gym

Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.

Agents used

CartPole
Mountain-Car
LunarLander
Self Driving Racing-Car
Taxi Driver

Algorithms used

Epsilon Greedy
PPO
Q-Learning
DQN

Tech Stack

These are some of the technologies we used in this project.

File Structure

.
├── app.py                  # Explain the function preformed by this file in short
├── docs                    # Documentation files (alternatively `doc`)
│   ├── report.pdf          # Project report
│   └── notes               # Folder containing markdown notes of lectures 
├── src                     # Source files (alternatively `lib` or `app`)
│   ├── Training/saved      # Trained Model of CartPole, CarRacing
│   └── Agent Codes         # All the agent codes
├── LICENSE
├── README.md

Getting Started

Prerequisites

OpenAI gym
- You can visit the OpenAI gym Repo or their documentation for the installation steps.
Stable-baselines3
- You can visit the installation section of Stable-baselines3 docs here
Jupyter-notebook
- refer here

For OpenAI gym

pip install gym

pip install gym[atari]    #For all atari dependencies

pip install gym[all]    #For all dependencies

For Stable-baselines3

pip install stable-baselines3

pip install stable-baselines3[extra]    #use this if you want dependencies like Tensorboard, OpenCV, Atari-py

Note: Some shells such as Zsh require quotation marks around brackets, i.e.

pip install 'gym[all]'

Installation

Clone the repo

git clone https://github.com/himanshu-02/rlagents

Usage

Clone the environment.
Use our codes on jupyter notebook.
You can use our saved models as well.

Results and Demo

CartPole

Maximum reward of 200 is achieved by the agent

Mountain Car

First clear before 200 episodes every time

Lunar Lander

Solved using DQN and after training good results are achieved

Taxi

Solved using Q learning and done perfectly

Car racing

Solved using PPO after training for 2m steps, higest score around 700/900 is achieved

Future Work

See todo.md for seeing developments of this project

Creating a custom environment
Coding Agents using various Models such as Deep Q-Learning, PPO, etc to train the custom environment and compare the results.
Completing more advanced environments available on OpenAI gym

Troubleshooting

Make sure you are using the correct environment name
Incase you missed it, Note: Some shells such as Zsh require quotation marks around brackets, i.e.

pip install 'gym[all]'

Contributors

Acknowledgements and Resources

SRA VJTI | Eklavya 2021
...

License

Describe your License for your project.

himanshu-02/rlagents

rlagents

Understanding Reinforcement Learning and Implementing RL agents for OpenAI Gym from Scratch

Table of Contents

About The Project

Reinforcement Learning

Aim:

Theory:

Reinforcement Learning

Markov Decision Process

Multi armed Bandit Problem

Epsilon Greedy Algorithm

OpenAI gym

Agents used

Algorithms used

Tech Stack

File Structure

Getting Started

Prerequisites

Installation

Usage

Results and Demo

CartPole

Mountain Car

Lunar Lander

Taxi

Car racing

Future Work

Troubleshooting

Contributors

Acknowledgements and Resources

License