Intrinsic-Motivations-RL

A non-exhuastive collection of papers regarding intrisic motivations and unsupervised RL.

Why are they useful?

Extrinsic rewards can sometimes be sparse or very difficult to design, making it hard for the agent to efficiently learn about the environment and how to achieve an objective.
Some of them may enable agents to discover meaningful behavior without external supervision.
They help us to understand the generalization of a model.

Papers

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, 2015
- maximize the MI between an action sequence and the resulting state given the current state
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2015
- reward novelty as measured by (encoded) state prediction error
Curiosity-driven Exploration by Self-supervised Prediction, 2017
- reward novelty as measured by (encoded) state prediction error
- learn what is controllabe/relevant through an inverse dynamics model
VIME: Variational Information Maximizing Exploration, 2017
- reward the IG (in dynamics model through observing new transitions) using BNN
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning, 2017
- reward the suprise of seeing the state that is transitioned to
Model-Based Active Exploration, 2019
- reward the IG (in dynamics model through observing new transitions) measured by JSD of an ensemble of dynamics model
Large-Scale Study of Curiosity-Driven Learning, 2018
- detailed study on practical considerations
Unsupervised Control Through Non-Parametric Discriminative Rewards, 2019
- goal-conditioned policy with a "goal achievement reward"
Diversity is All You Need: Learning Skills without a Reward Function, 2019
- learn distinguishable and diverse skills by learning to infer skill from states
Mutual Information State Intrinsic Control, ICLR2021
- maximize the MI between surrounding state and agent state
NovelD: A Simple yet Effective Exploration Criterion, NeurIPS2021
- reward the increase in novelty (measured by RND) to achieve BFS-like exploration
SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments, ICLR2021
- minimize the state entropy
Information is Power: Intrinsic Control via Information Capture, NeurIPS2021
- minimize the state visitation entropy in a partially observable setting

Comparison

exploration: state means a wide coverage of the state space, behavior means meaningful action sequences that lead to some states.

	Motivation	Dynamics	Model-Free	Scope	Exploration
ICM[3]	novelty	✅	✅	global	state
VIME[4]	information	✅	✅	global	state
DIAYN[9]	skill	❌	✅	global	behavior
MUSIC[10]	control	❌	✅	global	behavior
NovelID[11]	novelty difference	❌	✅	both	state
SMiRL[12]	certainty in state	✅	✅	episodic	behavior
IC2[13]	certainty in state visitation	✅	✅	episodic	behavior

btx0424/Intrinsic-Motivations-RL

Intrinsic-Motivations-RL

Why are they useful?

Papers

Comparison