therneau/survival

Inconsistent results of concordance function

Closed this issue · 2 comments

We're using the concordance function on a training data set (n ~ 200,000) and evaluating concordance on a test set (n ~ 100,000). When running concordance with "newdata" we don't always get the same result, even after setting a seed.

The influence matrix returned by concordance when the newdata option is specified sometimes contains negative and/or non-integer and/or impossibly large or small values and/or NaNs (since these are counts, this should never happen, and counts should be bounded by the possible number of comparisons in the range [0,(n^2 - n)/2)].

This bug seems to occur randomly, and I suspect there is a memory allocation problem in the C subroutine, although I haven't been able to check that yet.

Example code
set.seed(18)
fitimp <- survival::coxph(formula, data=d_train)
set.seed(54)
test<- concordance(fitimp, newdata = d_test, influence=2, ranks=TRUE)
View(test$influence)

Correct output
thumbnail_Screen Shot 2022-03-28 at 1 37 48 PM

One example of bugged output
thumbnail_Screen Shot 2022-03-28 at 1 35 55 PM

I'm not quite clear on this

  1. random numbers don't play a role in the concordance function.
  2. Your example code doesn't tell me anything concrete. What is the formula? What is d_train and/or d_test?
  3. What exactly are you printing out in the output?

A bit more to go on would help me dig into this.
Terry

I have managed to recreate the problem, and it was yet another facet of the rounding issues discussed in the "tied times" vignette. The use of newdata gave a path through the routine that skipped the aeqSurv checks; while the use of a large simulated data set almost guaranteed that there would be response times that differ by less than machine precision. I've now plugged that gap.