T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression (AISTATS2023)

Source code for the T-Phenotype approach proposed in paper "T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression".

Installation & Environment Setup

`pip` install

The simplest way to install is through pip:

pip install git+https://github.com/yvchao/tphenotype

# Alternatively:
# pip install git+https://github.com/vanderschaarlab/tphenotype

From source

To run the experiments, directly clone this repository via the following command.

git clone git@github.com:yvchao/tphenotype.git
# Alternatively:
# git clone git@github.com:vanderschaarlab/tphenotype.git

# Navigate into the repo:
cd tphenotype

# pip-install in editable mode:
pip install -e .

Note on extras

The following pip extras are available:

benchmarks (pip install -e .[benchmarks]): Adds additional dependencies needed for running the benchmarks. Install this extra if replicating benchmark results.
- external benchmarks require TensorFlow 1.x, this will be installed if Python is <= 3.8 (as it is not compatible with newer Python). Otherwise, these benchmarks cannot be run.
dev (pip install -e .[dev]): Adds benchmarks extra and additional development related dependencies.

For full details, see [options.extras_require] section in setup.cfg.

Note on CUDA

In order to use CUDA, make sure your virtual environment (or conda environment) has the appropriate CUDA binaries. See PyTorch Get Started for details.

The rest of this section is only relevant to benchmarks or dev installation extras with CUDA.

The benchmarks (and dev) install extras will install tensorflow==1.15.5 if your Python version is <= 3.8, as this is needed by external benchmarks. It is tricky to make TF1 work with CUDA, and you may find it easier to just use the CPU for these benchmarks.

In order to make it CUDA compatible, you will need to check compatibility, e.g. here. Since TF1 is an old library, it is not officially supported by most modern CUDA devices. However, NVIDIA maintains a compatible version of TF1. This will be installed by benchmarks and dev extras if your environment has Python 3.8, as this is the only Python version for which binaries are available. Specifically, nvidia-pyindex and nvidia-tensorflow[horovod] will be installed.

To check CUDA has worked for TF1, run the following in Python:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

If you do not see any errors, and a GPU device should be shown at the end (if available), this indicates success.

Certain issues may arise, for instance with CUDA for PyTorch and CUDA for tensorflow version compatibility. You will need to check the "NVIDIA CUDA Runtime" compatibility here and install the version of nvidia-tensorflow that matches the CUDA binaries used by torch. For example, if your installation of torch uses CUDA 11.8, you will need to install nvidia-tensorflow v22.12, like so:

pip install "nvidia-tensorflow[horovod]==1.15.5+nv22.12"

Datasets

Three datasets are used in the experiments.

Synthetic data: provided in this repo as data/synthetic/data_mixed.npz; can be generated by running data/synthetic/data_generation.ipynb.
PhysioNet ICU data: publicly available at PhysioNet.
ADNI data: can be downloaded from loni.

Experiments

There are three major parts of the experiment.

notebooks/benchmark/: run bash run_experiment.sh from within notebooks/benchmark/ and then summary.ipynb to generate benchmark results on the three datasets.
notebooks/case_study/: run experiment_adni.ipynb to generate the major results in the main manuscript.
notebooks/appendix/: run the four notebooks to generate all the rest results included in the appendix.

Notes

The exact hyperparameters used in the paper experiments can be found here. Place these under notebooks/benchmark/hyperparam_selection/ before running run_experiment.sh to use them in the benchmarks experiments.
Some experiments are sensitive to the specific hardware and sampling order (in particular, the "ICU" experiments); and while the exact results may somewhat differ when running in your local environment, the argument in the paper is unaffected.

Citation

If you find the software useful, please consider citing the following paper:

@inproceedings{tphenotype2023,
  title={T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression}
  author={Qin, Yuchao and van der Schaar, Mihaela and Lee,Changhee},
  booktitle={Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023},
  year={2023}
}