sahirbhatnagar/casebase

casebase in the presence of heavy censoring

Opened this issue · 1 comments

I just saw a talk about how there is heavy right censoring in the UK Biobank (>90%). The speaker mentioned that they needed to use a saddlepoint approximation to account for this right-censoring. Not sure what this means or what it applies to, but it did get me thinking about heavy censoring using casebase. Are there further adjustments needed?

Here is one recent reference: https://onlinelibrary.wiley.com/doi/abs/10.1002/cjs.11491

That's interesting. As a first thought: "heavy censoring" is sorta equivalent to "very few events", but ~10% of a million people is still a lot of events, so I think casebase would work fine.

On the other hand, the approximation to the full likelihood that we're using probably gets worse with heavy censoring and fixed ratio (I think Olli looked at that in one of his papers). So one way to get a better approximation would be to increase the number of base moments. And this may be where casebase starts running into convergence issues, when the case-base dataset is quite large and events are rare.

So I guess we should look at the literature on rare events and logistic regression.