The loss becomes neagative from positive values dring taining loop

Question

The loss becomes neagative from positive values dring taining loop

yijianSU22 opened this issue a month ago · 5 comments

Hi, I just ran a unet model on a train set, and used the dice and crossentropy loss as a loss function,but t found that the loss value is not normal , it became negative geadually. As bellow:
2024-04-27 22:54:02.697477: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2617/2617 [==============================] - 9485s 4s/step - loss: 0.3995 - accuracy: 0.0302 - val_loss: 0.3482 - val_accuracy: 0.0182
Epoch 2/20
2617/2617 [==============================] - 9453s 4s/step - loss: 0.1805 - accuracy: 0.2205 - val_loss: 0.1516 - val_accuracy: 0.9400
Epoch 3/20
2617/2617 [==============================] - 9428s 4s/step - loss: 0.0435 - accuracy: 0.9362 - val_loss: 0.1033 - val_accuracy: 0.9482
Epoch 4/20
2617/2617 [==============================] - 9412s 4s/step - loss: -0.0293 - accuracy: 0.9398 - val_loss: 0.0141 - val_accuracy: 0.9459
Epoch 5/20
2617/2617 [==============================] - 9444s 4s/step - loss: -0.0844 - accuracy: 0.9420 - val_loss: -0.0150 - val_accuracy: 0.9548
Epoch 6/20
2617/2617 [==============================] - 9436s 4s/step - loss: -0.1212 - accuracy: 0.9440 - val_loss: -0.0363 - val_accuracy: 0.9599
Epoch 7/20
2617/2617 [==============================] - 9397s 4s/step - loss: -0.1537 - accuracy: 0.9457 - val_loss: -0.0193 - val_accuracy: 0.9538
Epoch 8/20
2617/2617 [==============================] - 9305s 4s/step - loss: -0.1777 - accuracy: 0.9467 - val_loss: -0.0149 - val_accuracy: 0.9526
Epoch 9/20
2617/2617 [==============================] - 8968s 3s/step - loss: -0.2004 - accuracy: 0.9473 - val_loss: -0.0841 - val_accuracy: 0.9576
Epoch 10/20
2617/2617 [==============================] - 8787s 3s/step - loss: -0.2210 - accuracy: 0.9480 - val_loss: -0.0822 - val_accuracy: 0.9571
Epoch 11/20
2617/2617 [==============================] - 8794s 3s/step - loss: -0.2337 - accuracy: 0.9486 - val_loss: -0.0837 - val_accuracy: 0.9566
Epoch 12/20
2617/2617 [==============================] - 8809s 3s/step - loss: -0.2521 - accuracy: 0.9492 - val_loss: -0.0856 - val_accuracy: 0.9615
Epoch 13/20
2617/2617 [==============================] - 8804s 3s/step - loss: -0.2688 - accuracy: 0.9500 - val_loss: -0.1012 - val_accuracy: 0.9594
Epoch 14/20
2617/2617 [==============================] - 8807s 3s/step - loss: -0.2867 - accuracy: 0.9508 - val_loss: -0.0994 - val_accuracy: 0.9599
Epoch 15/20
2617/2617 [==============================] - 8721s 3s/step - loss: -0.2949 - accuracy: 0.9511 - val_loss: -0.1008 - val_accuracy: 0.9605
Epoch 16/20
2617/2617 [==============================] - 8684s 3s/step - loss: -0.3071 - accuracy: 0.9515 - val_loss: -0.0705 - val_accuracy: 0.9564
Epoch 17/20
349/2617 [===>..........................] - ETA: 37:27 - loss: -0.0398 - accuracy: 0.9501

and this is my loss function:
class categorical_dicePcrossentropy_weight(tf.keras.losses.Loss):
def init(self,class_weight,lamda=0.5):
super().init()
self.lamda = lamda
self.weight = class_weight

def call(self, y_true, y_pred):
    smooth = 1.e-5
    smooth = tf.constant(smooth,tf.float32)

    y_true = tf.cast(y_true,tf.float32)
    y_pred = tf.cast(y_pred,tf.float32)

    intersection = tf.math.reduce_sum(y_pred * y_true,axis=(1,2,3))
    union = tf.math.reduce_sum((y_pred+y_true),axis=(1,2,3))
    dice_coef = tf.math.reduce_sum(2 * (intersection + smooth) / (union + smooth),axis=0)

    loss1 = tf.math.reduce_mean(self.weight * dice_coef)

    epsilon = 1.e-5
    output = y_pred/tf.math.reduce_sum(y_pred,axis=-1,keepdims=True)
    output = tf.clip_by_value(output,epsilon,1-epsilon)

    loss = y_true * tf.math.log(output)

    loss = tf.math.reduce_mean(loss, axis=(1, 2, 3))
    loss = tf.math.reduce_mean(loss, axis=0)
    loss2 = tf.math.reduce_mean(self.weight * loss)

    total_loss = (1 - self.lamda) * (1 - loss1) + self.lamda * loss2

    return total_loss

I don't know why,Is there a way to resolve it?

Answer 1 · 2024-04-29T11:48:20.000Z

For small values tf.math.log(output) is negative
tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

Answer 2 · 2024-04-29T13:22:33.000Z

For small values tf.math.log(output) is negative

tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

Thanks very much, yes, you're right. here should be -y_true *tf.math.log(output)

Answer 3 · 2024-04-30T07:00:05.000Z

For small values tf.math.log(output) is negative

tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

Thanks very much, yes, you're right. here should be -y_true *tf.math.log(output)

hi，Sorry to bother you again，I don't know why I used the tf.keras.losses.CategoricalCrossentropy() to compute CE, the loss value still will be negative during training loop.

Answer 4 · 2024-05-06T06:25:59.000Z

Hi @yijianSU22 ,

The Op tf.math.log(x) outputs -inf if the value of x is 0 and nan if x<0. You can clip -inf values to a value you want using tf.clip_by_value . But for nan , clip_by_value also returns nan. SInce this is custom loss function, maybe you need to recheck it.