/homework3-policy-gradient

The homework for Cutting-Edge of Deep Learning, aka CEDL, from NTHU

Primary LanguageJupyter Notebook

Homework3-Policy Gradient

In this homework, you will use a neural network to learn a parameterize policy that can select action without consulting a value function. A value function may still be used to learn the policy weights, but is not required for action selection.

There are some advantage of the policy-based algorithms:

  • Policy-based methods also offer useful ways of dealing with continuous action spaces
  • For some tasks, the policy function is simpler and thus easier to approximate.

Introduction

We will use CartPole-v0 as environment in this homework. The following gif is the visualization of the CartPole:

For further description, please see here

Setup

  • Python 3.5.3
  • OpenAI gym
  • tensorflow
  • numpy
  • matplotlib
  • ipython

We encourage you to install Anaconda or Miniconda in your laptop to avoid tedious dependencies problem.

for lazy people:

conda env create -f environment.yml
source activate cedl
# deactivate when you want to leave the environment
source deactivate cedl

TODO

  • [60%] Problem 1,2,3: Policy gradient
  • [20%] Problem 5: Baseline bootstrapping
  • [10%] Problem 6: Generalized Advantage Estimation
    • for lazy person, you can refer to here
  • [10%] Report
  • [5%] Bonus, share you code and what you learn on github or yourpersonal blogs, such as this

Other

  • Deadline: 11/2 23:59, 2017
  • Some of the codes are credited to Yen-Chen Lin 😄
  • Office hour 2-3 pm in 資電館711 with Yuan-Hong Liao.
  • Contact andrewliao11@gmail.com for bugs report or any questions.