issue about the generation of action

Question

issue about the generation of action

Closed this issue 6 years ago · 3 comments

I have a issue about the generation of action. In your code, the action is generated as follows:
action = mu + np.sqrt(sigma) * epsilon
The mu and sigma denote the mean and stddev of the normal distribution of action, right?
But in your code, them maybe represent action and td_error respectively. I'm puzzled about two parameters.
And, it can be saw in many codes. So, can you explain this piece of code if you feel free?
英文写着累看着，您写中文也行。谢谢您！

Answer 1 · 2019-05-13T14:30:22.000Z

@PacificBase 的确是代表mean和stddev，并不是action和td_error。action是从一个分布里进行抽样得到的，td_error是前后两次计算得到的，这两个结果不会在actor单次的向前传播中计算出来。action和td_error是作为y_true传进去的，只在loss部分进行了计算。

Answer 2 · 2019-05-15T01:45:37.000Z

也就是说action = mu + np.sqrt(sigma) * epsilon相当于mean加上一个随机的epsilon乘上stddtv，其中的epsilon代表的就算是随机采样了。
还有一个问题，在计算action的对数概率时，为什么要在其及原本的pdf基础上再乘上一个epsilon呢？
不好意思，第一次自己写相关的代码，麻烦您了。

Answer 3 · 2019-05-15T14:15:27.000Z

@PacificBase 防止出现log(0)。