hkmztrk/DeepDTA

Question about cindex_score function

Opened this issue · 3 comments

Hi! Thank you very much for the awesome work!

I have a question regarding the c-index below.

def cindex_score(y_true, y_pred):

My question is: should the line tf.cast(g == 0.0, tf.float32) be tf.cast(tf.math.equal(g, 0.0), tf.float32) ?

Thank you very much in advance.

Kyohei

Hi @Kotober, thanks a lot for your interest in DeepDTA.

I think both should work since they aim the same thing. Did you try it with tf.cast(tf.math.equal(g, 0.0), tf.float32) ? Does cindex change?

Hi!

Yes. It changed definitely.
I assume you referenced the code in this link and copied it. But the code in this link is incorrect ,probably.
https://stackoverflow.com/questions/43576922/keras-custom-metric-iteration/43591066#43591066

Would it be possible for you to check the c-index function again by yourself? because I guess you are developing another method, starting from this DeepDTA. I appreciate if you could be able to publish the modified function.

Also, another question here: could you explain which function you used to report the value in DeepDTA's paper? cindex_score or get_cindex?


Try this block of code for your reference.
The outputs from functions differ.

deepdta 1: 0.6538461538461539
deepdta 2: 0.61538464
new function:  0.65384614
lifelines tool 0.6395348837209303
import tensorflow as tf
import numpy as np

def cindex_score(y_true, y_pred):
    g = tf.subtract(tf.expand_dims(y_pred, -1), y_pred)
    g = tf.cast(g == 0.0, tf.float32) * 0.5 + tf.cast(g > 0.0, tf.float32)
    f = tf.subtract(tf.expand_dims(y_true, -1), y_true) > 0.0
    f = tf.matrix_band_part(tf.cast(f, tf.float32), -1, 0)
    g = tf.reduce_sum(tf.multiply(g, f))
    f = tf.reduce_sum(f)
    return tf.where(tf.equal(g, 0), 0.0, g/f) #select

def cindex_score_correction(y_true, y_pred):
    g = tf.subtract(tf.expand_dims(y_pred, -1), y_pred)
    g = tf.cast(tf.math.equal(g, 0.0), tf.float32) * 0.5 + tf.cast(g > 0.0, tf.float32)
    f = tf.subtract(tf.expand_dims(y_true, -1), y_true) > 0.0
    f = tf.matrix_band_part(tf.cast(f, tf.float32), -1, 0)
    g = tf.reduce_sum(tf.multiply(g, f))
    f = tf.reduce_sum(f)
    return tf.where(tf.equal(g, 0), 0.0, g/f) #select

def get_cindex(Y, P):
    summ = 0
    pair = 0    
    for i in range(1, len(Y)):
        for j in range(0, i):
            if i is not j:
                if(Y[i] > Y[j]):
                    pair +=1
                    summ +=  1* (P[i] > P[j]) + 0.5 * (P[i] == P[j])            
    if pair is not 0:
        return summ/pair
    else:
        return 0

ypred = np.array([.1, .41, .3, .2,0.0, .1, 0.0,.41, .3, .2])
ytrue = np.array([.32,.63, .9, .8,0.0, .11,.41, .32,.2, 0.0])
ypred_ten = tf.convert_to_tensor(ypred, dtype=tf.float32)
ytrue_ten = tf.convert_to_tensor(ytrue, dtype=tf.float32)

print('deepdta 1:', get_cindex(ytrue, ypred))
c_deepdta = cindex_score(y_true=ytrue_ten, y_pred=ypred_ten)
print('deepdta 2:', tf.Session().run(c_deepdta))

c = cindex_score_correction(y_true=ytrue_ten, y_pred=ypred_ten)
print('new function with tensor: ', tf.Session().run(c))

from lifelines.utils import concordance_index
# https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#model-selection-based-on-predictive-power
print('lifelines tool',concordance_index(ytrue, ypred))

Thank you very much!

Hi @Kotober, thank you! I really appreciate your effort for this!

Well, this is interesting - get_cindex is my original code without tf and I used it and cindex_score monitor both, they usually produced scores with only slight differences.

For paper, I remember using get_cindex values because that was the function I used for other methods - but I have to make sure. I remember using cindex_score for monitoring the training. I'll check this for myself soon, and let you know.

Best!