Master reinforcement learning and deep reinforcement learning by building intelligent applications using OpenAI, TensorFlow, and Python
Reinforcement Learning with Python will take your learning to the next level. It will help you master the concepts of reinforcement learning to deep reinforcement learning. The book will explain everything from scratch by implementing practical applications at work or projects, all written in Python.
The book starts with an introduction to Reinforcement Learning, OpenAI, and TensorFlow. You will then explore Reinforcement learning algorithms and concepts.
This example-rich guide will introduce you to neural networks and deep learning, covering various deep learning algorithms. You will explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will also learn how deep reinforcement learning algorithms can be used with TensorFlow and Keras to build intelligent applications.
- 1.1. What is Reinforcement Learning
- 1.2. Reinforcement Learning Cycle
- 1.3. How RL differs from other ML paradigms
- 1.4. Elements of Reinforcement Learning
- 1.5. Agent Environment Interface
- 1.6. Types of RL Environments
- 1.7. Reinforcement Learning Platforms
- 1.8. Applications
- 2.1. Setting Up Your Machine
- 2.2. Installing Anaconda and Docker
- 2.3. Installing OpenAI Gym and Universe
- 2.4. OpenAI Gym Basics
- 2.5. Training a Robot to walk using Gym
- 2.6. OpenAI Gym Universe
- 2.7. Building a Video Game Bot using Universe
- 2.8. Tensorflow Installation and Fundamentals
- 3.1. Markov Process and Markov Chain
- 3.2. MarKov Decision Process
- 3.3. Returns and Rewards
- 3.4. Discount Factors
- 3.5. Finite and Infinite MDP
- 3.6. Value and Q functions
- 3.7. Bellman Equation and its derivation
- 3.8. Solving Bellman Equation using DP
- 3.9. Solving Frozen Lake using Value and Policy Iteration
- 4.1. What is Monte Carlo?
- 4.2. Monte Carlo Prediction and Control
- 4.3. Phases of MCT
- 4.4. Alpha Go with MCT
- 4.5. Building Tic-Tac-Toe using MCT
- 5.1. Temporal Difference Learning
- 5.2. TD prediction and TD error
- 5.3. On policy TD Control - SARSA
- 5.4. Build maze game using SARSA
- 5.5. Off policy TD control - Q-learning
- 5.6. Cart-Pole Balancing with Q-learning
- 5.7. R-Learning
- 6.1. Multi-Armed Bandit Problem
- 6.2. MarKov Decision Process
- 6.3. Exploration-Exploitation dilemma
- 6.4. Upper confidence bound arm selection
- 6.5. Thompson sampling strategy
- 6.6. Contextual bandits
- 6.7. Implementation in python
- 7.1. Policy Gradients
- 7.2. Value VS Policy Gradients
- 7.3. Finite difference methods
- 7.4. Likelihood ratio methods
- 7.5. Actor critic algorithm and reinforce
- 7.6. Solving contextual bandits using policy gradients
- 7.7. Building pong game using policy gradients
- 8.1. Deep Learning
- 8.2. Neural networks and activation functions
- 8.3. Recurrent neural networks and LSTM
- 8.4. Song lyrics generation using LSTM-RNN
- 8.5. Convolutional neural networks
- 8.6. Image classification using CNN
- 8.7. Autoencoders
- 8.8. Why deep reinforcement?
- 9.1. Q Networks
- 9.2. Deep Q-network
- 9.3. Convolutional layers
- 9.4. Experience Relays
- 9.5. Separate target newtok
- 9.6. Building atari game using deep q-network architecture in tensorflow
- 9.7. Sentiment Analysis using DQN
- 10.1. POMDP
- 10.2. POMDP vs MDP
- 10.3. Problem of POMDP
- 10.4. What is DRQN
- 10.5. DRQN Architecture
- 10.6. Applying DRQN to solve POMDP
- 10.7. Implementation in tensorflow
- 11.1. A3C
- 11.2. Asynchronous RL framework
- 11.3. Asynchronous one step Q and SARSA learning
- 11.4. Asynchronous advantage over actor critic
- 11.5. Building intelligent doom game using A3C algorithm