GuanLab/Leopard

dice_coef metric

Al-Murphy opened this issue · 2 comments

Hi,

What metric is the function dice_coef measuring? I know it's used to measure performance of the model on the training and validation set but I can't tell by the code what it is measuring? Is there a reason this measure is used to monitor performance rather than cross entropy loss which the model training aims to minimise?

Thanks,
Alan.

Hi Alan,
The dice_coef function is an adapted version of Sørensen–Dice coefficient. We customized the calculation formula for genomic data:

  1. There are blacklist regions in human genome, where the gold standard labels are "-1" instead of "1"/"0". Therefore a mask was introduce to exclude those regions through mask=K.cast(K.greater_equal(y_true_f,-0.5),dtype='float32'). For more information about blacklist, you can find it from this paper.
  2. When the denominator is zero, there will be an error. The ss=10 was added to both the nominator and the denominator to avoid such errors.

Actually, I tried the dice loss, but it was worse than the cross entropy loss for this task. You can replace the dice loss with other functions to monitor the training. We simply keep it in the code as the "metric" and use cross entropy as the "loss" when compiling the model loss=crossentropy_cut, metrics=[dice_coef].

Thanks,
Hongyang

Great, that clears things up for me. Thank you very much for the prompt response!