datawhalechina/easy-rl
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
Jupyter NotebookNOASSERTION
Issues
- 0
RuntimeError: in a2c.py An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module
#166 opened by chensisi0730 - 3
- 2
There are one suggestion for a word
#165 opened by shi-yang999 - 2
第三章第四节书中60页价值函数是单调的证明过程是不是有问题?
#163 opened by lixinliu1995 - 1
关于书中DDPG算法的疑问
#146 opened by yxz777 - 2
DuelingDQN.ipynb中可能存在的两个BUG~
#140 opened by libermeng - 0
运行Q-learning探索策略研究.ipynb时报错: AttributeError: 'numpy.random._generator.Generator' object has no attribute 'rand'
#164 opened by jay-bo - 1
关于全书结构的建议
#161 opened by chenslcool - 2
PPO算法的实现, 为啥要给概率取对数?
#147 opened by chzhan - 0
/chapter14/chapter14
#157 opened by qiwang067 - 0
/chapter14/chapter14
#156 opened by qiwang067 - 0
/chapter14/chapter14
#158 opened by qiwang067 - 3
《9.3 优势演员-评论员算法》的公式(9.3)错误
#155 opened by Sjtu-hyg - 3
- 1
notebooks/Q-learning/QLearning.ipynb的绘图代码存在一个小瑕疵
#153 opened by 976213951 - 2
怎么在Linux服务器上运行demo程序?
#124 opened by bjzhb666 - 2
连续动作空间的PPO算法
#149 opened by YZH-WDNMD - 1
关于条件全期望公式的推导的问题
#152 opened by SacuraA - 1
纸质版是怎么做的?
#151 opened by powergiant - 0
- 0
the version of numpy
#125 opened by HanggeAi - 0
DDPG算法实现出现问题
#144 opened by yxz777 - 1
图6.8左下角标识应该是“动作价值(Q)”?
#143 opened by xuleimath - 0
我在运行DQN代码时,初始的state总会多一个值。
#142 opened by yxz777 - 1
- 1
4.3 REINFORCE:蒙特卡洛策略梯度
#135 opened by sungaok - 2
错别字
#139 opened by ConnorSiXiong - 2
最新的版本,可以出PDF吗
#137 opened by chensisi0730 - 1
value_iteration 算法不收敛 ?
#138 opened by chensisi0730 - 6
随书代码在哪
#129 opened by GoWithWind2015 - 3
- 2
- 1
第五章勘误
#130 opened by notomatoes - 1
Edit problem in Chapter3
#128 opened by mvllwong - 1
第四章图4.10标注是不是有误?
#127 opened by njwm - 3
1.7.1 Gym示例 返回值增多了
#126 opened by neverevergiveup - 1
DoubleDQN的upadate()中的reward_batch少了.unsqueeze(1)
#121 opened by beerjtu - 1
DoubleDQN和DQN的update函数代码好像是一样的
#123 opened by FinnJob - 1
Spelling mistake
#122 opened by d3ac - 1
MonteCarlo code error
#120 opened by beifeng1937 - 1
PPO advantage calculation
#114 opened by XinXU-USTC - 1
请问以后会增加MARL算法吗?
#108 opened by pmy0721 - 1
Q-learning 出错
#111 opened by ZHUGUODONG1 - 1
conda的环境需要换成python==3.8了
#115 opened by ExileSaber - 2
common文件夹里是不是少个py文件呀
#112 opened by zl-yang - 1
DQN代码错误
#116 opened by Solitario119 - 3
“3.3.1 蒙特卡洛策略评估”中经验均值问题
#113 opened by paulyzhangSmartNews - 1
书写错误
#110 opened by tools-only - 1
TD3 目标策略平滑化的工作原理 和 原始论文描述不一致
#109 opened by mabaoer - 1