/DRL-Agents

research and implementations of Deep RL agents and their applications

Primary LanguagePythonMIT LicenseMIT


Contents

Back to top


deep_rl


RL Landscape

Back to top

68747470733a2f2f706c616e73706163652e6f72672f32303137303833302d6265726b656c65795f646565705f726c5f626f6f7463616d702f696d672f616e6e6f74617465642e6a7067


reinforcement-learning

Source: eleurent/phd-bibliography


RL Agents Implementation

Back to top

algorithms

  • Value Optimization
    • [QR-DQN]
    • [DQN] - [Slides] [Code] [rainbow]
    • [Bootstrapped DQN]
    • [DDQN]
    • [NEC]
    • [MMC]
    • [N-step Q Learning]
    • [PAL]
    • [Categorical DQN]
    • [NAF]
  • Policy Optimization
    • [Policy Gradient]
    • [Actor Critic]
      • [DDPG] [Code]
        • [HAC DDPG]
        • [DDPG with HER]
      • [Clipped PPO]
      • [PPO]
  • [DFP]
  • Imitation
    • [Behavioural cloning]
    • [Inverse Reinforcement Learning] [Code] [irl-imitation-code]
    • [Generative Adversarial Imitation Learning]

Value Optimization Agents

Back to top

Policy Optimization Agents

Back to top

General Agents

Back to top

Imitation Learning Agents

Back to top

  • Behavioral Cloning (BC) (code)
Hierarchical Reinforcement Learning Agents

Back to top

Memory Types

Back to top

Exploration Techniques

Back to top


RL History

Back to top

  • Temporal difference(TD) learning (1988)
  • Q‐learning (1998)
  • BayesRL (2002)
  • RMAX (2002)
  • CBPI (2002)
  • PEGASUS (2002)
  • Least‐Squares Policy Iteration (2003)
  • Fitted Q‐Iteration (2005)
  • GTD (2009)
  • UCRL (2010)
  • REPS (2010)
  • DQN (2014) - DeepMind

Back to top

awesome


Back to top

landscape

RL Environments

Back to top

  • [Acrobot]
  • [Bike]
  • [Blackjack]
  • [Cartpole]
  • [ContextBandit]
  • [Continuous Chain]
  • [Corridor]
  • [Discrete Chain]
  • [Discretiser (for continuous environments)]
  • [Double Loop]
  • [Environment]
  • [Gridworld]
  • [Inventory management]
  • [Linear context bandit]
  • [Linear dynamic quadratic]
  • [Mountaincar (2d and 3d)]
  • [POMDP Maze]
  • [Optimistic Task]
  • [Puddleworld]
  • [Random MDPs]
  • [Riverswim]

RL Mechanisms

Back to top

  • [Attention and Memory]
  • [Unsupervised learning ]
    • [GANs]
    • [GQN]
    • [UNREAL]
  • [Hierarchical RL]
    • [FuNs]
    • [Option-Critic]
    • [STRAW]
    • [h-DQN]
    • [Stochastic Neural Networks]
  • [Multi-agent RL]
  • [Relational RL]
  • [Learning to Learn, a.k.a. Meta-Learning]
    • [Few/One/Zero-shot Learning]
      • [MAML]
    • [Transfer and Multi-Task Learning]
    • [Learning to Optimize]
    • [Learning to Re-inforcement Learn]
    • [Learning Combinatorial Optimization]
    • [AutoML]

RL Games

Back to top

  • Chinook (1997;2007) for Checkers,
  • Deep Blue (2002) for chess,
  • Logistello (1999) for Othello,
  • TD-Gammon (1994) for Backgammon,
  • GIB (2001) for contract bridge,
  • MoHex (2017) for Hex,
  • DQN (2016)(2018) for Atari 2600 games,
  • AlphaGo (2016a) and AlphaGo Zero (2017) for Go,
  • Alpha Zero (2017) for chess, shogi, and Go,
  • Cepheus (2015), DeepStack (2017), and Libratus (2017a;b) for heads-up Texas Hold’em Poker,
  • Jaderberg et al. (2018) for Quake III Arena Capture the Flag,
  • OpenAI Five, for Dota 2 at 5v5, https://openai.com/five/,
  • Zambaldi et al. (2018), Sun et al. (2018), and Pang et al. (2018) for StarCraft II

Back to top

  • [Board Games]
    • [Computer Go]
    • [AlphaGo: Trainig pipeline with MCTS]
    • [AlphaGo Zero]
    • [Alpha Zero]
  • [Card Games]
    • [DeepStack]
  • [Video Games]
    • [Atari 2600 games]
    • [StarCraft]
    • [StarCraft II mini-games]
    • [Quake III Arena]
    • [Minecraft]
    • [Super Smash Bros]
    • [Doom]
    • [ViZDoom]

DRL applied to Robotics

Back to top

  • [Sim-to-Real]
    • [MuJoCo]
  • [Imitation Learning]
  • [Value-based Learning]
  • [Policy-based Learning]
  • [Model-based Learning]
  • [Autonomous Driving Vehicles]

DRL applied to NLP

Back to top

  • [Sequence Generation]
  • [Machine Translation]
  • [Dialogue Systems]

DRL applied to Vision

Back to top

  • [Recognition]
  • [Motion Analysis]
  • [Scene Understanding]
  • [Vision + NLP]
  • [Visual Control]
  • [Interactive Perception]

References

Back to top



Maintainer

Gopala KR / @gopala-kr