xylee95/Spatiotemporal-Attack-On-Deep-RL-Agents

Function compute_grad()

Closed this issue · 2 comments

Sorry to bother you. I do not understand the function compute_grad() in ppo_inference.py. You did not mention it in your paper.
Thanks.
# compute analytical gradient
coeff = -(action - means) / ((np.power(std_devs, 3) * (np.sqrt(2 * np.pi))))
power = -(np.power((action - means), 2)) / (2 * np.power(std_devs, 2))
exp = np.exp(power)
grad_a = coeff * exp

Hi there,

The function compute_grad() is used to compute the gradient of the action probability distribution. In PPO, we parameterize the action distribution using a Gaussian. Hence, we can directly differentiate the equation of a Gaussian distribution to compute the gradient analytically to perform gradient descent in order to find the action with the lowest probability.

This is a minor detail explained in footnote #4 of the paper. You can also sample different actions to estimate the numerical gradients as done in DQN, but since we know the distribution in PPO, it is much more efficient to compute the analytical gradient.

Please let me know if you have more concerns!

Thank you for your timely reply, i do not have other questions.