A non-exhuastive collection of papers regarding intrisic motivations and unsupervised RL.
- Extrinsic rewards can sometimes be sparse or very difficult to design, making it hard for the agent to efficiently learn about the environment and how to achieve an objective.
- Some of them may enable agents to discover meaningful behavior without external supervision.
- They help us to understand the generalization of a model.
- Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, 2015
- maximize the MI between an action sequence and the resulting state given the current state
- Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2015
- reward novelty as measured by (encoded) state prediction error
- Curiosity-driven Exploration by Self-supervised Prediction, 2017
- reward novelty as measured by (encoded) state prediction error
- learn what is controllabe/relevant through an inverse dynamics model
- VIME: Variational Information Maximizing Exploration, 2017
- reward the IG (in dynamics model through observing new transitions) using BNN
- Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning, 2017
- reward the suprise of seeing the state that is transitioned to
- Model-Based Active Exploration, 2019
- reward the IG (in dynamics model through observing new transitions) measured by JSD of an ensemble of dynamics model
- Large-Scale Study of Curiosity-Driven Learning, 2018
- detailed study on practical considerations
- Unsupervised Control Through Non-Parametric Discriminative Rewards, 2019
- goal-conditioned policy with a "goal achievement reward"
- Diversity is All You Need: Learning Skills without a Reward Function, 2019
- learn distinguishable and diverse skills by learning to infer skill from states
- Mutual Information State Intrinsic Control, ICLR2021
- maximize the MI between surrounding state and agent state
- NovelD: A Simple yet Effective Exploration Criterion, NeurIPS2021
- reward the increase in novelty (measured by RND) to achieve BFS-like exploration
- SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments, ICLR2021
- minimize the state entropy
- Information is Power: Intrinsic Control via Information Capture, NeurIPS2021
- minimize the state visitation entropy in a partially observable setting
- exploration: state means a wide coverage of the state space, behavior means meaningful action sequences that lead to some states.
Motivation | Dynamics | Model-Free | Scope | Exploration | |
---|---|---|---|---|---|
ICM[3] | novelty | ✅ | ✅ | global | state |
VIME[4] | information | ✅ | ✅ | global | state |
DIAYN[9] | skill | ❌ | ✅ | global | behavior |
MUSIC[10] | control | ❌ | ✅ | global | behavior |
NovelID[11] | novelty difference | ❌ | ✅ | both | state |
SMiRL[12] | certainty in state | ✅ | ✅ | episodic | behavior |
IC2[13] | certainty in state visitation | ✅ | ✅ | episodic | behavior |