Deep Reinforcement Learning Algorithms

This repository contains implementations of various deep reinforcement learning algorithms completed as part of the Spring 2017 offering of CS 294-112, UC Berkeley's Deep Reinforcement Learning course.

Disclaimer: The code contained in this repository may or may not relate to coursework in future offerings of CS 294-112. The implementations here are provided for educational purposes only; if you are a student in the course, I highly suggest attempting the problems yourself.

Dependencies

The dependencies of the algorithms include:

TensorFlow
Keras
NumPy
OpenAI Gym
MuJoCo [Paid library, but there is a free student license]

HW1: Imitation Learning and DAgger on MuJoCo

I implemented behavior cloning on multiple MuJoCo environments. Expert policies produce rollouts that are used as training data for a feedforward neural network. In addition to normal behavior cloning, I also implemented the DAgger algorithm, which performs significantly better. Finally, I varied the number of rollouts used to train the agent, and observed that more rollouts as training data produces better results, as expected.

HW2: Policy Iteration and Value Iteration for Markov Decision Processes (MDPs)

This is a fairly straightforward implementation of Policy Iteration and Value Iteration on a simple gridworld environment.

HW3: Deep Q-Networks on Atari Games

I implemented the DQN algorithm on the Pong Atari environment in the OpenAI Gym. Using pixel data gives better results than using only RAM data.

HW4: Policy Gradients

I extended the existing discrete Policy Gradients algorithm to Pendulum on OpenAI Gym, a continuous environment. In addition, I used a neural network to learn the value function.

Final Project

The code for this project has not been released yet, but my writeup can be found here.

akaraspt/drl