Application examples of GuanRank in deep learning (images), lightGBM and recurrent neural networks (time-series)
- python 3.0
- R packages: "survival", "timeROC", "Bolstad2", "ROCR", "randomForestSRC"
In each of directory, run simulate_all.sh will return all relevant simulation results.
- The delta directory simulates how the death rate scale will affect the performance of the ranking algorithm.
- The delta_RSF_fixed directory simulates how the death rate scale will affect the performance of the random survival forest algorithm.
- The delta_cox_fixed directory simulates how the death rate scale will affect the performance of the Cox model.
simulate_new_feature_example_fixed, simulate_new_feature_example_RSF_fixed, simulate_new_feature_example_cox_fixed
- The simulate_new_feature_example_fixed directory simulates how the number of features and number of examples will affect the performance of the ranking algorithm.
- The simulate_new_feature_example_RSF_fixed directory simulates how the number of features and number of examples will affect the performance of the random survival forest algorithm.
- The simulate_new_feature_example_cox_fixed directory simulates how the number of features and number of examples will affect the performance of the Cox model. Because two parameters are tested in this section, we create a matrix of performance estimations.
simulate_new_noise_example_fixed, simulate_new_noise_example_RSF, simulate_new_noise_example_cox_fixed
- The simulate_new_noise_example_fixed directory simulates how the noise level and the number of examples will affect the performance of the ranking algorithm.
- The simulate_new_noise_example_RSF directory simulates how the noise level and the number of examples will affect the performance of the random survival forest algorithm.
- The simulate_new_noise_example_cox_fixed directory simulates how the noise level and the number of examples will affect the performance of the Cox model. Because two parameters are tested in this section, we create a matrix of performance estimations.
simulate_new_noise_example_fixed, simulate_new_noise_example_RSF, simulate_new_noise_example_cox_fixed
- The simulate_new_noise_example_fixed directory simulates how the noise level and the number of examples will affect the performance of the ranking algorithm.
- The simulate_new_noise_example_RSF directory simulates how the noise level and the number of examples will affect the performance of the random survival forest algorithm.
- The simulate_new_noise_example_cox_fixed directory simulates how the noise level and the number of examples will affect the performance of the Cox model. Because two parameters are tested in this section, we create a matrix of performance estimations.
- The simulate_scaling director simulates how the feature scales will affect the performance of the ranking algorithm.
- The simulate_scaling_RSF director simulates how the feature scales will affect the performance of the random survival forest algorithm.
- The simulate_scaling_cox director simulates how the feature scales will affect the performance of the Cox model.
- The simulation_new directory simulates how the noise level and the number of features will affect the performance of the ranking algorithm.
- The simulation_new_RSF directory simulates how the noise level and the number of features will affect the performance of the random survival forest algorithm.
- The simulation_new_cox directory simulates how the noise level and the number of features will affect the performance of the Cox model. Because two parameters are tested in this section, we create a matrix of performance estimations.
Deep learning for survival analysis using PyTorch.
- main.py: Code for main entry
- net.py: Code for LSTM model
- preprocess.py: Code for preprocessing
- dataset.py: Dataset for disk survival analysis
- python 3.6
- pytorch 1.3
- tqdm 4.43
- pandas 0.25
- numpy 1.16
- sci-kit learn 0.21
- scikit-survival 0.6
...
$ SEED=42 && echo
-
Preprocessing:
- Data removing: remove low quality data and remove non-important features
- Data imputation: lots of missing data especially at early times
- Data normalization: normalize features
- Timesteps: the number of timesteps to be used
-
Run all experiment for cross-validation: sh bash_all.sh ...
- python3
- numpy It comes pre-packaged in Anaconda.
- tensorflow (1.14)
- keras (2.2.5)
- lightgbm
- download_image.py : download images from gdc portal
- convert_svs_into_png.py : convert the format of images from svs into png
- extract_clinical_feature.py : extract clincal features and one-hot encode them
- calculate_guanrank.r and guanrank.R : calculate GuanRank labels
- nn1_code_only : deep neural network models that use images as input
- lgbm1_code_only : tree-based lightGBM models that use clinical features as input