ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
Playing atari with Deep Reinforcement learning
An Analysis of Temporal-Difference Learning with Function Approximation
Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation
Approximation by superposition of sigmoidal function
MultiLayer FeedForward networks are universal approximator
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
THE CONCRETE DISTRIBUTION: A CONTINUOUS RELAXATION OF DISCRETE RANDOM VARIABLES