dice_coef metric
Al-Murphy opened this issue · 2 comments
Hi,
What metric is the function dice_coef
measuring? I know it's used to measure performance of the model on the training and validation set but I can't tell by the code what it is measuring? Is there a reason this measure is used to monitor performance rather than cross entropy loss which the model training aims to minimise?
Thanks,
Alan.
Hi Alan,
The dice_coef
function is an adapted version of Sørensen–Dice coefficient. We customized the calculation formula for genomic data:
- There are blacklist regions in human genome, where the gold standard labels are "-1" instead of "1"/"0". Therefore a mask was introduce to exclude those regions through
mask=K.cast(K.greater_equal(y_true_f,-0.5),dtype='float32')
. For more information about blacklist, you can find it from this paper. - When the denominator is zero, there will be an error. The
ss=10
was added to both the nominator and the denominator to avoid such errors.
Actually, I tried the dice loss, but it was worse than the cross entropy loss for this task. You can replace the dice loss with other functions to monitor the training. We simply keep it in the code as the "metric" and use cross entropy as the "loss" when compiling the model loss=crossentropy_cut, metrics=[dice_coef]
.
Thanks,
Hongyang
Great, that clears things up for me. Thank you very much for the prompt response!