pythonlessons/Reinforcement_Learning

Dueling question

Opened this issue · 1 comments

Hello,
Thanks for a great project. It's very useful. I have a question on the model code related to the Dueling algorithm. For example:
Pong-v0_DQN_CNN_TF2.py

Here is an example of the code:
action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)

let's say our batch looks like this:
a = tf.constant([[1.0, 2.0], [-2.0, 3.0], [3.0, -4.0]])
print('a=', a)
a= tf.Tensor(
[[ 1. 2.]
[-2. 3.]
[ 3. -4.]], shape=(3, 2), dtype=float32)

The result of the "K.mean" function will be a tensor with shape (1, 1):
print('Kmean=', K.mean(a[:, :], keepdims=True))
Kmean= tf.Tensor([[0.5]], shape=(1, 1), dtype=float32)

Shouldn't there be a tensor with shape (3, 1)?
print('Kmean=', K.mean(a[:, :], axis=1, keepdims=True))
Kmean= tf.Tensor(
[[ 1.5]
[ 0.5]
[-0.5]], shape=(3, 1), dtype=float32)

If we assume that our batch contains 3 elements, then the mean value should be calculated for each element in the batch separately. Or am I missing something ?

same here