How do 1 and -1 reward be used?
guotong1988 opened this issue · 1 comments
guotong1988 commented
I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.
In Chinese:是不是reward为1和-1的情况也都放在deque里,那么reward为1和-1的被sample出来的几率岂不是很低,反馈就会很慢?
@songrotek Thank you.