This repository contains implementations of various deep reinforcement learning algorithms completed as part of the Spring 2017 offering of CS 294-112, UC Berkeley's Deep Reinforcement Learning course.
Disclaimer: The code contained in this repository may or may not relate to coursework in future offerings of CS 294-112. The implementations here are provided for educational purposes only; if you are a student in the course, I highly suggest attempting the problems yourself.
The dependencies of the algorithms include:
- TensorFlow
- Keras
- NumPy
- OpenAI Gym
- MuJoCo [Paid library, but there is a free student license]
I implemented behavior cloning on multiple MuJoCo environments. Expert policies produce rollouts that are used as training data for a feedforward neural network. In addition to normal behavior cloning, I also implemented the DAgger algorithm, which performs significantly better. Finally, I varied the number of rollouts used to train the agent, and observed that more rollouts as training data produces better results, as expected.
This is a fairly straightforward implementation of Policy Iteration and Value Iteration on a simple gridworld environment.
I implemented the DQN algorithm on the Pong Atari environment in the OpenAI Gym. Using pixel data gives better results than using only RAM data.
I extended the existing discrete Policy Gradients algorithm to Pendulum on OpenAI Gym, a continuous environment. In addition, I used a neural network to learn the value function.
The code for this project has not been released yet, but my writeup can be found here.