/deep-reinforcement-learning

DQN with Prioritized Experience Replay, DDPG for Continous Environments, DDPG for Multi-Agent Reinforcement Learning

Primary LanguageJupyter NotebookMIT LicenseMIT

Deep Reinforcement Learning

This repository contains solutions to Deep Reinforcement Learning Problems. Different types of agents are utilitzed according to the properties of the problems.

Detailed Readmes can be found in each project folder.

Table of contents


Environments

There are three environments that agents were developed on. Each increase in level of difficulty, from discrete action space, to continous actions and finally to multi-agent problems.

Banana Environment

In this challenge a single agent has to collect yellow bananas, while avoiding purple ones.

More information in this folder


Environment Screenshot


Continous Reacher Environment

In this challenge a single agent has to maintain it's end effector on a moving target. Each step that the end effector spends in the target location results in positive rewards.

More information in this folder


Trained Agent


Multi-Agent continous Tennis Environment

In this environment two agents play tennis. Each agent receives positive rewards for hitting the ball over the net, and a smaller negative reward if the ball falls on their side.

More information in this folder


Trained Agent

Agents

Deep Q-Learning Agent

This repo contains implementation of two DQN Agents in PyTorch:

  • a base Agent, with a Replay Buffer, a seperate target Q-Network, with a 2 hidden layer deep network
  • an Agent built on top of the base Agent, which utilizes Prioritized Replay.

Deep Deterministic Policy Gradient

This repo contains implementation of a DDPG Agent in PyTorch.

The DDPG architecture is considered by many to be an Actor-Critic method. In the learning step the agent selects a next step to calculate the Temporal Difference (which isbiased in regards of the actual value) with a Policy network (which has large variance in regards of the actual value). This way both bias and variance is decreased.

The DDPG agent was able to solve the environment under 250episodes.


Multi-Agent Deep Deterministic Policy Gradient

The DDPG Agent has been extended to support multi-agent environments.

The DDPG agent was able to solve the environment under 2000episodesand reached amaximumoverallscoreaveragedover 100 episodeof +2.05 by episode 2655. To speed up the hyperparameter tuning phase, an abstract training loop was utilized,that could calculate permutations of hyperparameters during evening hours when electricity cost are lower.


References

Environment and Agents were both based on starter codes from the Udacity Deep Reinforcement Learning Nanodegree. Github repo can be found here.

Special thanks to Miguel Morales, for writing such a comprehensive book on Deep Reinforcement Learning. I have taken many clarifications on theory and practice when developing the agents, his book can be found here.