##Reading Group on Reinforcement Learning topics ##NYU, Fall 2016
###Logistics
- Meetings run every Wednesday at 9h30 (before the CILVR lab meeting), at the large conference room at 715/719 Broadway 12th floor. Breakfast will be provided.
- Paper discussion + Paper review plan: Each week we will assign one or two papers to volunteers who will present it following week. During the reading/presentation, we will edit a review of the paper which will be posted here. [also subject to change].
- Guest speakers. We will try to invite RL experts (e.g. G. Tesauro) with some frequency.
- Other communication channels ( Facebook groups?, Slack? ) [TBD].
###Organization The RG is initially organized by J.Bruna, K. Cho, S. Sukhbaatar, K. Ross, D. Sontag, with help from the rest of the CILVR group.
-
9/21: Tutorial on MDPs, Policy Gradient (part 1). [Keith Ross]
- Markov Decision Process Paradigm
- Discounted and average cost criteria
- Model-free Reinforcement Learning Paradigm
- Policy Gradient: parameterized policies; policy gradient theorem; Monte Carlo Policy Gradient (REINFORCE)
- Using Policy Gradient and deep neural networks to learn the Atari game "pong".
-
9/28: Tutorial on MDPs, Policy Gradient (part 2). [Keith]
- Dynamic Programming equations for MDPs
- Policy iteration
- Value iteration
- Monte Carlo methods for RL
- Q-learning for RL
-
10/5 and 10/12: Actor-Critic. [Martin]
- Deterministic Policy Gradient
- Off-Policy variants
- Relevant Papers:
-
10/19: Tutorial on OpenAI Gym and Mazebase. Also, Twitter's new twrl [Sainaa and Ilya]
- MazeBase: https://github.com/facebook/MazeBase
-
10/26: Apprenticeship Learning via Inverse Reinforcement Learning and Model-Free Imitation Learning with Policy Optimization [Arthur]
-
10/31: Trust region policy optimization (TRPO) [Elman, Ilya]
- Guided Policy Search
- Value Iteration Networks
- TRPO [Elman, early November]
- Review of recent hierarchical reinforcement learning papers [Sainaa]
- Intrinsically Motivated Reinforcement Learning [Martin?]:
- High dimensional action spaces:
- Stability in RL (these 4 papers shouldn't take more than 1 or 2 lectures):
- Playing Atari with Deep Reinforcement Learning
- Human-level control through deep reinforcement learning
- Double Q-learning. Also consider reading about the optimizer's curse to make the reading simpler.
- Prioritized Experience Replay