In tf.gradients, why -self.action_gradien is needed?
GoingMyWay opened this issue · 1 comments
GoingMyWay commented
Hi, in the code of ActorNetwork
self.unnormalized_actor_gradients = tf.gradients(
self.scaled_out, self.network_params, -self.action_gradient)
Why -self.action_gradient
is needed here? grad_ys
is -self.action_gradient
, but you returned self.unnormalized_actor_gradients
.
GoingMyWay commented
My understanding is in the paper, the actor policy updating format it
and since the J
is the expected value and our target is to maximizing the J
, so in
self.unnormalized_actor_gradients = tf.gradients(
self.scaled_out, self.network_params, -self.action_gradient)
self.actor_gradients = list(map(lambda x: tf.div(x, self.batch_size), self.unnormalized_actor_gradients))
negating self.action_gradient
is a good trick to minimizing the J
.
In
self.unnormalized_actor_gradients = tf.gradients(
self.scaled_out, self.network_params, -self.action_gradient)
-self.action_gradient
is the weight as in the paper
Part A is tf.gradients(self.scaled_out, self.network_params)
and part B is-self.action_gradient
.