Trust Region Policy Optimization |
ICML2015 |
1502.05477 |
policy-based |
The Option-Critic Architecture |
AAAI2017 |
1609.05140 |
HRL, option-critic |
Learning to Act by Predicting the Future |
ICLR2017 |
1611.01779 |
VizDoom |
Meta Networks |
ICML2017 |
1703.00837 |
meta-learning, MetaNet, few-shot classification |
FeUdal Networks for Hierarchical Reinforcement Learning |
ICML2017 |
1703.01161 |
FeUDalNet, HRL |
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks |
ICML2017 |
1703.03400 |
meta-learning, MAML |
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World |
IROS2017 |
1703.06907 |
sim to real, domain randomization |
One-Shot Imitation Learning |
NIPS2017 |
1703.07326 |
imitation, demonstration |
Multi-Level Discovery of Deep Options |
- |
1703.08294 |
DDO, HRL |
DART - Noise Injection for Robust Imitation Learning |
CoRL2017 |
1703.09327 |
imitation learning , add noise -> more robust |
Stochastic Neural Networks for Hierarchical Reinforcement Learning |
ICLR2017 |
1704.03012 |
HRL, StocasticNN |
Deep Q-learning from Demonstrations |
AAAI2018 |
1704.03732 |
DQfD : imitation + RL, discrete |
Parameter Space Noise for Exploration |
ICLR2018 |
1706.01905 |
OpenAI NoisyNet |
Noisy Networks for Exploration |
ICLR2018 |
1706.10295 |
DeepMind NoisyNet, part of Rainbow |
Deep Reinforcement Learning from Human Preferences |
NIPS2017 |
1706.03741 |
RL + human feedback (easier than demonstration) |
Hindsight Experience Replay |
NIPS2017 |
1707.01495 |
HER, goal-based env, sparse reward, learn from fail |
Emergence of Locomotion Behaviours in Rich Environments |
- |
1707.02286 |
PPO |
Robust Imitation of Diverse Behaviors |
NIPS2017 |
1707.02747 |
imitation learning : VAE (behavioral cloning) + GAIL |
Imitation from Observation - Learning to Imitate Behaviors from Raw Video via Context Translation |
ICRA2018 |
1707.03374 |
imitation learning from obs, context translation |
Reverse Curriculum Generation for Reinforcement Learning |
CoRL2017 |
1707.05300 |
reverse curriculum |
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards |
- |
1707.08817 |
DDPGfD : DDPG + DQfD, off-policy imitation, continuous goal-based env |
When Waiting is not an Option - Learning Options with a Deliberation Cost |
AAAI2018 |
1709.04571 |
HRL, A2OC : A3C + OC + deliberation cost |
Autonomous Extracting a Hierarchical Structure of Tasks in Reinforcement Learning and Multi-task Reinforcement Learning |
- |
1709.04579 |
HRL, association rule |
One-Shot Visual Imitation Learning via Meta-Learning |
CoRL2017 |
1709.04905 |
MIL : meta learning (MAML) + imitation learning (BC) |
Overcoming Exploration in Reinforcement Learning with Demonstrations |
ICRA2018 |
1709.10089 |
Similar to DDPGfD, imitation + DDPG + HER |