Inconsistent results of concordance function
Closed this issue · 2 comments
We're using the concordance function on a training data set (n ~ 200,000) and evaluating concordance on a test set (n ~ 100,000). When running concordance with "newdata" we don't always get the same result, even after setting a seed.
The influence matrix returned by concordance when the newdata option is specified sometimes contains negative and/or non-integer and/or impossibly large or small values and/or NaNs (since these are counts, this should never happen, and counts should be bounded by the possible number of comparisons in the range [0,(n^2 - n)/2)].
This bug seems to occur randomly, and I suspect there is a memory allocation problem in the C subroutine, although I haven't been able to check that yet.
Example code
set.seed(18)
fitimp <- survival::coxph(formula, data=d_train)
set.seed(54)
test<- concordance(fitimp, newdata = d_test, influence=2, ranks=TRUE)
View(test$influence)
I'm not quite clear on this
- random numbers don't play a role in the concordance function.
- Your example code doesn't tell me anything concrete. What is the formula? What is d_train and/or d_test?
- What exactly are you printing out in the output?
A bit more to go on would help me dig into this.
Terry
I have managed to recreate the problem, and it was yet another facet of the rounding issues discussed in the "tied times" vignette. The use of newdata gave a path through the routine that skipped the aeqSurv checks; while the use of a large simulated data set almost guaranteed that there would be response times that differ by less than machine precision. I've now plugged that gap.