seungeunrho/minimalRL

torch.gather in relevant to policy gradient

migom6 opened this issue · 0 comments

As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone help me to understand the process?
Thanks for the help. 😃