datawhalechina/easy-rl
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
Jupyter NotebookNOASSERTION
Issues
- 2
第三章第四节书中60页价值函数是单调的证明过程是不是有问题?
#163 opened by lixinliu1995 - 1
关于书中DDPG算法的疑问
#146 opened by yxz777 - 2
DuelingDQN.ipynb中可能存在的两个BUG~
#140 opened by libermeng - 0
运行Q-learning探索策略研究.ipynb时报错: AttributeError: 'numpy.random._generator.Generator' object has no attribute 'rand'
#164 opened by jay-bo - 1
关于全书结构的建议
#161 opened by chenslcool - 1
- 2
PPO算法的实现, 为啥要给概率取对数?
#147 opened by chzhan - 0
/chapter14/chapter14
#157 opened by qiwang067 - 0
/chapter14/chapter14
#156 opened by qiwang067 - 0
/chapter14/chapter14
#158 opened by qiwang067 - 3
《9.3 优势演员-评论员算法》的公式(9.3)错误
#155 opened by Sjtu-hyg - 3
- 1
notebooks/Q-learning/QLearning.ipynb的绘图代码存在一个小瑕疵
#153 opened by 976213951 - 2
怎么在Linux服务器上运行demo程序?
#124 opened by bjzhb666 - 2
连续动作空间的PPO算法
#149 opened by YZH-WDNMD - 1
关于条件全期望公式的推导的问题
#152 opened by SacuraA - 1
纸质版是怎么做的?
#151 opened by powergiant - 0
- 0
the version of numpy
#125 opened by HanggeAi - 0
DDPG算法实现出现问题
#144 opened by yxz777 - 1
图6.8左下角标识应该是“动作价值(Q)”?
#143 opened by xuleimath - 0
我在运行DQN代码时,初始的state总会多一个值。
#142 opened by yxz777 - 1
- 1
4.3 REINFORCE:蒙特卡洛策略梯度
#135 opened by sungaok - 2
错别字
#139 opened by ConnorSiXiong - 2
最新的版本,可以出PDF吗
#137 opened by chensisi0730 - 1
value_iteration 算法不收敛 ?
#138 opened by chensisi0730 - 6
随书代码在哪
#129 opened by GoWithWind2015 - 3
- 2
- 1
第五章勘误
#130 opened by notomatoes - 1
Edit problem in Chapter3
#128 opened by mvllwong - 1
第四章图4.10标注是不是有误?
#127 opened by njwm - 3
1.7.1 Gym示例 返回值增多了
#126 opened by neverevergiveup - 1
DoubleDQN的upadate()中的reward_batch少了.unsqueeze(1)
#121 opened by beerjtu - 1
DoubleDQN和DQN的update函数代码好像是一样的
#123 opened by FinnJob - 1
Spelling mistake
#122 opened by d3ac - 1
MonteCarlo code error
#120 opened by beifeng1937 - 1
PPO advantage calculation
#114 opened by XinXU-USTC - 2
能否提供代码中主要库的版本
#105 opened by LeonardWan - 1
请问以后会增加MARL算法吗?
#108 opened by pmy0721 - 1
Q-learning 出错
#111 opened by ZHUGUODONG1 - 1
conda的环境需要换成python==3.8了
#115 opened by ExileSaber - 2
common文件夹里是不是少个py文件呀
#112 opened by zl-yang - 1
DQN代码错误
#116 opened by Solitario119 - 3
“3.3.1 蒙特卡洛策略评估”中经验均值问题
#113 opened by paulyzhangSmartNews - 1
书写错误
#110 opened by tools-only - 1
TD3 目标策略平滑化的工作原理 和 原始论文描述不一致
#109 opened by mabaoer - 1
- 1