theSquaredError/rl-resources

rl-resources

Eligibility traces and TD(lambda)

deep reinforcement learning

ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION

Playing atari with Deep Reinforcement learning

Deep Q-Learning

Policy Gradients

Function approximation

An Analysis of Temporal-Difference Learning with Function Approximation

Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation

Universal Approximation Theoram

Approximation by superposition of sigmoidal function

MultiLayer FeedForward networks are universal approximator

Multi-agent Reinforcement Learning

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

Gumbel Softmax

THE CONCRETE DISTRIBUTION: A CONTINUOUS RELAXATION OF DISCRETE RANDOM VARIABLES

CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX

Video Explanation