High Number of False positives for Binary class with Softmax Layer

Question

High Number of False positives for Binary class with Softmax Layer

burhr2 opened this issue 2 years ago · 1 comments

Hello, Thanks for the excellent repo you have put together. We are working on a 3D binary segmentation task for detecting lesions in spinal cord MRI images. we have a situation of class imbalance with a lesion(foreground class) far less represented than the background class (Proportional of foreground voxels per training patches(48x48x48) is 0.9%, i.e. average for patches with lesions). We are using a 3Dunet model with sigmoid output, which works well. When updating the 3Dunet with softmax, there is a tendency for many false positive predictions compared to sigmoid output. We train on randomly selected patches, so we can easily have training patches with the background class only. Can you give insight, or maybe we are doing something wrong (please see the updated softmax output code below). My intuition is that since the background class cover a large proportion of voxels, there is a tendency for the model to learn the background class more than the foreground, even with the penalties from the losses. For example, using asymmetric_focal_loss resulted in a model predicting only the background class. Another example can be a dice_coeffient calculated per class and returning the average dice. This seems to greatly influence the good dice we get from the background class compared to the foreground.

Lesion level results with default parameters
TP = true positive
FP = false positive
FN = false negative
GT = number of lesions in the ground truth

No. Loss TP FP FN GT

1 Asymetric_unified_loss 31 580 26 57

2 Symetric_unified_loss 23 183 34 57

3 asymmetric_focal_tversky_loss 29 358 28 57

4 asymmetric_focal_loss 0 1166 57 57

5 symmetric_focal_tversky_loss 0 0 57 57

6 tversky_loss 24 578 33 57

7 combo_loss 22 267 35 57

8 focal_tversky_loss 27 281 30 57

9 focal_loss 7 48 50 57

10 symmetric_focal_loss 0 923 57 57

11 dice_loss 31 382 26 57

The input image and the two-channel mask

# sigmoid version input_img = (1, 48, 48, 48, 1) single_channel_mask = (1, 48, 48, 48, 1) # Softmax version two_channel_mask = tensorflow.keras.utils.to_categorical (single_channel_mask) # inputs of the model input_img = (1, 48, 48, 48, 1) two_channel_mask = (1, 48, 48, 48, 2) # 1st channel for background 2nd channel for foreground

3Dunet

# Define the global variables KERNEL_SIZE = (3, 3, 3) POOLING_SIZE = (2, 2, 2) FILTERS = [16, 32, 64] shape = (48, 48, 48, 1) depth = 2 def unet3D_softmax(num_classes = 2): """Whole Unet architecture from the predefined blocks""" input = tf.keras.layers.Input(shape=shape) layer = input hist = [] for i in range(depth): (layer, save) = get_down_block(i, layer, dropout=dropout) hist.append(save) layer = tf.keras.layers.Conv3D(FILTERS[depth], KERNEL_SIZE, padding="same")( layer ) layer = tf.keras.layers.BatchNormalization()(layer) layer = tf.keras.layers.Activation("relu")(layer) layer = tf.keras.layers.Conv3D(FILTERS[depth] * 2, KERNEL_SIZE, padding="same")( layer ) layer = tf.keras.layers.BatchNormalization()(layer) layer = tf.keras.layers.Activation("relu")(layer) for i in reversed(range(depth)): layer = get_up_block(layer, hist[i], i, dropout=dropout) layer = tf.keras.layers.Dropout(dropout)(layer) if num_classes == 1: #Binary activation = 'sigmoid' else: activation = 'softmax' layer = tf.keras.layers.Conv3D(num_classes, 1, padding="same", activation=activation)(layer) model = tf.keras.Model(inputs=input, outputs=layer) optimizer = tf.keras.optimizers.Adam(learning_rate=lr) model.compile( optimizer=optimizer, loss=asym_unified_focal_loss(), metrics=[dice_coefficient()] ) return model

During Inference

predictions_list = [ ] for patches in range(test_image_patches): single_patch_prediction = model.predict(patches) # shape of prediction 1,48,48,48,2 output probabilities single_patch_prediction_argmax = np.argmax(single_patch_prediction, axis=4) # output 1,48,48,48 single_patch_prediction_argmax = np.expand_dims(single_patch_prediction_argmax, axis = -1) # output 1,48,48,48,1 for compactibility with our pipeline predictions_list.append(single_patch_prediction_argmax)

Answer 1 · 2022-09-26T16:35:45.000Z

Hi Burhan,
Thank you very much for your question and it sounds like an interesting project!
It is surprising because I would not expect differences between using sigmoid or softmax as the last activation, given that softmax with two outputs is effectively equivalent to using a sigmoid activation.

Just a few things I would like to check first:

What are the results you are getting with the sigmoid layer?
Have you tried the same loss function (although perhaps requiring different code) with sigmoid and softmax layers separately?
Have you repeated any of these experiments or used cross validation (same loss function and activation)? It might be useful to see how the performance varies despite using the same parameters.

Unrelated to loss functions, two suggestions that come to mind that might help with performance:

The U-Net depth used is quite low, which might cause problems with learning this difficult task. I wonder if you have tried using higher depths like 3 or 4?
With patch-based predictions, I have found it useful to making overlapping predictions, and averaging over the activations (i.e. probabilities) and then applying argmax. I found this very useful particularly for reducing false positives. The functions to do that can be found in: https://github.com/frankkramer-lab/MIScnn/blob/master/miscnn/utils/patch_operations.py

Best wishes,
Michael

No.	Loss	TP	FP	FN	GT
1	Asymetric_unified_loss	31	580	26	57
2	Symetric_unified_loss	23	183	34	57
3	asymmetric_focal_tversky_loss	29	358	28	57
4	asymmetric_focal_loss	0	1166	57	57
5	symmetric_focal_tversky_loss	0	0	57	57
6	tversky_loss	24	578	33	57
7	combo_loss	22	267	35	57
8	focal_tversky_loss	27	281	30	57
9	focal_loss	7	48	50	57
10	symmetric_focal_loss	0	923	57	57
11	dice_loss	31	382	26	57