强化学习：原理与Python实现

世界上第一本配套 TensorFlow 2 代码的强化学习教程书

**第一本配套 TensorFlow 2 代码的纸质算法书

本书介绍强化学习理论及其 Python 实现。

理论完备：全书用一套完整的数学体系，严谨地讲授强化学习的理论基础，主要定理均给出证明过程。各章内容循序渐进，覆盖了所有主流强化学习算法，包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
案例丰富：在您最爱的操作系统（包括 Windows、macOS、Linux）上，基于最新的 Python 3.8、Gym 0.17 和 TensorFlow 2.2（兼容 TensorFlow 1.15），实现强化学习算法。全书实现统一规范，体积小、重量轻。第 1～9 章给出了算法的配套实现，环境部分只依赖于 Gym 的最小安装，在没有 GPU 的计算机上也可运行；第 10～12 章介绍了多个热门综合案例，涵盖 Gym 的完整安装和自定义扩展，在有普通 GPU 的计算机上即可运行。

初识强化学习查看代码：useGym
Markov决策过程查看代码：useBellman CliffWalking
有模型数值迭代查看代码：FrozenLake
回合更新价值迭代查看代码：Blackjack
时序差分价值迭代查看代码：Taxi
函数近似方法查看代码：MountainCar
回合更新策略梯度方法查看代码：CartPole
执行者/评论者方法查看代码：Acrobot
连续动作空间的确定性策略查看代码：Pendulum
综合案例：电动游戏查看代码：Breakout Pong Seaquest
综合案例：棋盘游戏查看代码：TicTacToe Reversi boardgame2
综合案例：自动驾驶查看代码：AirSimNH

QQ群

群号：935702193 （免费入群）
关于入群验证问题：由于QQ的bug，即使正确输入答案，也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母，清注意大小写。人名的首字母应大写。

书籍勘误与更新

2019年08月第1版第1次印刷：查看勘误与更新
2019年11月第1版第2次印刷：查看勘误与更新
2019年12月第1版第3次印刷：查看勘误与更新
电子版不提供勘误与更新。

判断纸质版书籍版次的方法 / 确定纸质书印刷时间的方法

“前言”之前有1页是“图书在版编目（CIP）数据”。这页下部的表格中有一项是“版次”，该项标明当前书是什么时候第几次印刷的。

本书数学符号表

下载PDF

本书电子版

本书不仅有纸质版销售，也有电子版销售。不过，电子版没有提供配套的勘误与更新资源，所以推荐购买纸质版。电子版销售平台包括但不限于：

华章鲜读：微信订阅公众号“华章电子书”，“在线书城”，搜索“强化学习”，在“鲜读”栏目下找到本书
Kindle电子书：https://www.amazon.cn/dp/B07X936G34/
京东读书：https://e.jd.com/30513215.html
知乎书店：https://www.zhihu.com/pub/reader/119634282

Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with TensorFlow 2 Implementation

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
Practice: Every chapter is accompanied by high quality implementation based on Python 3.7, Gym 0.17, and TensorFlow 2.1.

Introduction of Reinforcement Learning
Markov Decision Process
Model-based Numeric Iteration
Monte-Carlo Learning
Temporal Difference Learning
Function Approximation
Policy Gradient
Actor-Critic
Deterministic Policy Gradient
Case Study: Video Game
Case Study: Board Game
Case Study: Self-Driving Car

BibTeX

@book{xiao2019,
 title     = {Reinforcement Learning: Theory and {Python} Implementation},
 author    = {Zhiqing Xiao}
 year      = 2019,
 month     = 8,
 publisher = {China Machine Press},
}

A-Pai/rl-book

强化学习：原理与Python实现

目录

Reinforcement Learning: Theory and Python Implementation

Table of Contents