Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb

Question

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb

ParmpalGill opened this issue 5 years ago · 1 comments

why multiply by action and use reduce sum instead of argmax?

Answer 1 · 2019-09-26T08:25:09.000Z

I think its because actions is a 1hot vector and there is 1 only in the chosen action,
So multiplying will give you a vector of zeros instead of one place which will hold the qvalue.
the reduce_sum just gets this number out because all the rest are zeros.
What do you think?