Predictive-Survival-Analysis using Machine Learning Algorithms

A problem frequently faced by applied statisticians is the analysis of time to event data. The analysis of survival experiments is complicated by issues of censoring, where an individual’s life length is known to occur only in a certain period of time where individuals enter the study only if they survive a sufficient length of time or individuals are included in the study only if the event has occurred by a given date. The emphasis is on applying, exploring and comparing the survival analysis techniques in semi parametric and nonparametric setup.

ABSTRACT

The objective of this study is to compare the performances of Cox Proportional Hazard Regression(CoxPH) and Random Survival Forests (RSF) methods with a real data set related to Bone Marrow transplantation. Most popular of survival analyses is Cox regression analysis because it is a semiparametric method for investigating the effect of several variables upon the time a specified event takes to happen. Recently, random survival forests (RSF) (Ishwaran et al. (2008)) has been used for the analysis of survival data. It is an ensemble tree method for the analysis of right censored survival data. The motivation of this study is to identify the most important factors influencing the success or failure of the transplantation procedure. Healthcare data are valuable, take a lot of money, time and man power to collect and hence we might not want to throw away variables unnecessarily. Hence the priority is to optimize the model accuracy given that we are able to retain most number of variables.

HIGHLIGHTS

  • Dealing with Data Imbalance
  • Missing Data
  • Fixing Multicollinearity among Categorical variables
  • Huristic decisions
  • Application of Subjective knowledge
  • Cox-LASSO
  • Random Survival Forest