
Variable Tensor("Neg:0", shape=(), dtype=float32) has `None` for gradient.

I get the following issue running, reinforce_agent.py and all of under, 3-atari. Prior to this, this same lines caused, Tensor to Numpy array issue which was resolved by adding,

from tensorflow.python.framework.ops import disable_eager_execution

Provided above, I'm now faced with this issue. I've tried various solutions including using K.eval(loss) before, but that cause some other issue. My tensorfzlow version is 2.4.1, Keras version 2.4.3 and Numpy version 1.19.5.

Model: "sequential"
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 24)                384       
dense_1 (Dense)              (None, 24)                600       
dense_2 (Dense)              (None, 5)                 125       
Total params: 1,109
Trainable params: 1,109
Non-trainable params: 0
Traceback (most recent call last):
  File "1-grid-world/7-reinforce/reinforce_agent.py", line 95, in <module>
    agent = ReinforceAgent()
  File "1-grid-world/7-reinforce/reinforce_agent.py", line 28, in __init__
    self.optimizer = self.optimizer()
  File "1-grid-world/7-reinforce/reinforce_agent.py", line 55, in optimizer
    updates = optimizer.get_updates(self.model.trainable_weights, loss)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 727, in get_updates
    grads = self.get_gradients(loss, params)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 719, in get_gradients
    raise ValueError("Variable {} has `None` for gradient. "
ValueError: Variable Tensor("Neg:0", shape=(), dtype=float32) has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

Any solution to this? @Hyeokreal @keon

This was resolved as follows by making the following code change:

    def optimizer(self):
        action = K.placeholder(dtype=float, shape=(None, 5))
        discounted_rewards = K.placeholder(shape=(None,))

        # Calculate cross entropy error function
        action_prob = K.sum(action * self.model.output, axis=1) 
        cross_entropy = K.log(action_prob) * discounted_rewards
        loss = -K.sum(cross_entropy)

        # create training function
        optimizer = Adam(lr=self.learning_rate)
        updates = optimizer.get_updates(params=self.model.trainable_weights, loss=loss)
        train = K.function(inputs=[self.model.input, action, discounted_rewards], outputs=self.model.output, updates=updates)

        return train