14.3公式错误
Opened this issue · 0 comments
StevenJokess commented
https://hrl.boyuai.com/chapter/2/sac%E7%AE%97%E6%B3%95#143-soft-%E7%AD%96%E7%95%A5%E8%BF%AD%E4%BB%A3
应该是,
其中,状态价值函数被写为
$$
V\left(s_t\right)=\mathbb{E}{a_t \sim \pi}\left[Q\left(s_t, a_t\right)-\alpha \log \pi\left(a_t \mid s_t\right)\right]=\mathbb{E}{a_t \sim \pi}\left[Q\left(s_t, a_t\right)\right]+\alpha H\left(\pi\left(\cdot \mid s_t\right)\right)
$$
原文漏掉了\alpha,
更多可参考我项目:https://github.com/StevenJokess/d2rl/blob/master/chapter/SAC.md
QQ群交个朋友:171097552
付款表达感谢: