Reinforcement-Learning-Simple-Tutorial Q-learning Sarsa Sarsa lambda DQN Policy Gradient Vanilla PG deep deterministic policy gradient A3C credit to: Morvan Python