pemami4911/deep-rl

In tf.gradients, why -self.action_gradien is needed?

GoingMyWay opened this issue · 1 comments

Hi, in the code of ActorNetwork

        self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)

Why -self.action_gradient is needed here? grad_ys is -self.action_gradient , but you returned self.unnormalized_actor_gradients .

My understanding is in the paper, the actor policy updating format it

image

and since the J is the expected value and our target is to maximizing the J, so in

self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)
        self.actor_gradients = list(map(lambda x: tf.div(x, self.batch_size), self.unnormalized_actor_gradients))

negating self.action_gradient is a good trick to minimizing the J.

In

self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)

-self.action_gradient is the weight as in the paper

image

Part A is tf.gradients(self.scaled_out, self.network_params) and part B is-self.action_gradient.