Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence

This is the code for our paper Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence, which as accepted as a spotlight for NeurIPS 2020. Please use the following temporary BibTeX when citing our paper:

@InCollection{Rieck20,
  author        = {Rieck, Bastian and Yates, Tristan and Bock, Christian and Borgwardt, Karsten and Wolf, Guy and Turk-Browne, Nick and Krishnaswamy, Smita},
  title         = {Uncovering the Topology of Time-Varying {fMRI} Data using Cubical Persistence},
  year          = {2020},
  eprint        = {2006.07882},
  archiveprefix = {arXiv},
  primaryclass  = {q-bio.NC},
  pubstate      = {forthcoming},
  booktitle     = {Advances in Neural Information Processing Systems~33~(NeurIPS)},
  publisher     = {Curran Associates, Inc.},
}

You will find numerous generated files such as a summary statistics and embeddings in this repository. The raw fMRI data, however, are too large to be easily re-distributed here. Please access them under https://openneuro.org/datasets/ds000228/versions/1.0.0.

The following instructions demonstrate how to use the package and how to reproduce the experiments.

Installing and using the package

The code requires poetry to be installed. Please install this package using the suggested installation procedure for your operating system. Afterwards, you can install the package as follows:

poetry install

Please use poetry shell to start a shell in which the package and its dependencies are installed. We will assume that all commands are executed in the ephemeral code directory.

Subsequently, we will discuss how to reproduce the individual experiments and plots.

Summary statistics

The folder results/summary_statistics contains the results of the summary statistics calculation as JSON files. The key is the participant id, the value is a nested dictionary with the respective time series, calculated based on one of the summary statistics mentioned in the paper. Calculating these summary statistics by yourself is not possible currently because it requires the raw data. However, the file ephemeral/scripts/calculate_summary_statistics.sh shows how to perform this calculation provided the raw data are available.

Embedding experiment

That being said, to reproduce Figure 3a, 3b, 3c, and 3d, please run the following commands:

# For the topological summary statistics; notice that
# 'total_persistence_p1' is the same as the 1-norm we
# we used in the paper.
python embed_summary_statistics.py  -s 'total_persistence_p1' ../results/summary_statistics/brainmask.json
python embed_summary_statistics.py  -s 'infinity_norm_p1' ../results/summary_statistics/brainmask.json

# For `baseline-tt` (see comments above concerning the data):
python embed_baseline_autocorrelation.py ../results/baseline_autocorrelation/brainmask/*.npz

# For `baseline-pp` (see comments above converning the data):
python embed_baseline_autocorrelation.py ../results/baseline_autocorrelation_parcellated/brainmask/*.npz

Here is an example output file (showing the infinity norm summary statistic embedding based on the whole-brain mask):

You can also calculate these embeddings for other masks, such as occipitalmask or xormask. We have provided the necessary files for this, but did not discuss them in the paper so far for space reasons.

Please note that some of the plots might appear 'flipped' (with respect to the y-axis); this only affects the visualisation, but not the interpretation.

Age prediction experiment

This reproduces Figure 3e in the paper.

To run the corresponding calculations, call the predict_age.py script on different input data sets. The script is sufficiently smart to automatically use the appropriate fitting method based on the input data:

# For the topological summary statistics. Again, they also work for
# different brain masks (not showing all of them here). Please note
# that `total_persistence_p1` is equivalent to the `p1_norm`, as we
# describe it in the paper (the terminology in the paper was chosen
# because it is more consistent).
python predict_age.py -s 'total_persistence_p1' ../results/summary_statistics/brainmask.json
python predict_age.py -s 'infinity_norm_p1' ../results/summary_statistics/brainmask.json

# For the baseline-tt and baseline-pp experiments, respectively. For
# both experiments, other brain masks are also available.
python predict_age.py ../results/baseline_autocorrelation/brainmask/*.npz
python predict_age.py ../results/baseline_autocorrelation/occipitalmask/*.npz
python predict_age.py ../results/baseline_autocorrelation/xormask/*.npz
python predict_age.py ../results/baseline_autocorrelation_parcellated/brainmask/*.npz
python predict_age.py ../results/baseline_autocorrelation_parcellated/occipitalmask/*.npz
python predict_age.py ../results/baseline_autocorrelation_parcellated/brainmask/*.npz

Here is an example output of what you get for the infinity norm calculation:

python predict_age.py -s 'infinity_norm_p1' ../results/summary_statistics/brainmask.json

122it [00:00, 257.88it/s]
R^2: 0.37
Correlation coefficient: 0.61
MSE: 3.38

You can see the R-squared coefficient (used for fitting the model), the correlation coefficient (reported in the paper in the main text), and the mean-squared-error (reported in the supplemental materials in an extended version of the table that is shown in the main paper).

Additional baselines

During the review period, we implemented additional baselines. To reproduce the cells entitled tt-corr-tda and pp-corr-tda from the table in Figure 3e, please use data from two new folders:

results/persistence_images_matrices: this folder contains persistence images generated from time-by-time volumes.
results/persistence_images_from_parcellated_matrices: this folder contains persistence images generated from parcellated volumes.

The folders contain the name matrices because the persistence images are generated from correlation matrices.

Variability analysis: Calculating cohort brain state trajectories

To calculate the cohort brain state trajectories shown in Figure 4, you need to use the corresponding persistence images (these files are also pre-computed because they involve the raw data). To visualise the brain state trajectories for the whole-brain mask, for example, please use the following call:

python visualise_cohort_trajectories.py -d ../results/persistence_images/brainmask_sigma1.0_r20.json

This will create a figure similar to this one:

Note that the ordering of cohorts has been swapped in the paper. As we describe in the main text, we were 'blinded' to the actual ages of the participants until the every end; hence, all visualisations are shown using the original cohort labels.

The visualisation script offers additional options, which we did not describe in the paper so far, such as a 'smoothing' process based on rolling windows. Feel free to experiment by calling

python visualise_cohort_trajectories.py --help

to see the available options! This command-line argument is understood by most of the tools that we provide, making it possible to explore the code with ease.

The corresponding cohort trajectories that we describe in the paper are stored in the folder results/cohort_trajectories. Note that they might be overwritten when calling the script.

Variability analysis: Showing the event histograms

To create the across-cohort-variability data (that we subsequently depict in Figure A.4 in the appendix and whose histograms are shown in Figure 4 in the main text), you can use the following code:

python analyse_variability_across_cohorts.py -d ../results/persistence_images/brainmask_sigma1.0_r20.json

This example would calculate the across-cohort variability curve of the whole-brain mask and depict it. The figure in the paper (Figure A.4 in the supplemental materials) is improved in terms of the visualisation and colour-coded according to the time, whereas the generated figure is more 'plain-looking':

Please note that the -d parameter in the script above is crucial as it removes the first time steps (during which a blank screen is shown). In the other scripts, this removal happens automatically, but here, we have an option for it (it has no influence on the calculation at other time steps, but the curve itself looks slightly different from the ones that we report in the paper in Figure A.4). As usual, you can also run this script for the persistences images of the other brain masks. The corresponding curves are stored in results/across_cohort_variability.

Finally, to convert these curves into the variability histograms that we depict in Figure 4, please use the following code:

python peri_histogram_analysis.py ../results/across_cohort_variability/brainmask_sigma1.0_r20.csv
python peri_histogram_analysis.py ../results/across_cohort_variability/occipitalmask_sigma1.0_r20.csv
python peri_histogram_analysis.py ../results/across_cohort_variability/xormask_sigma1.0_20.csv

This will result in a histogram like this one (here, the whole-brain mask is used):

Variability analysis: Running the bootstrap experiments

Finally, to show the details of the bootstrap experiments concerning the variability analysis, you can use the following code:

python peri_histogram_bootstrap_analysis.py ../results/across_cohort_variability/brainmask_sigma1.0_r20.csv
python peri_histogram_bootstrap_analysis.py ../results/across_cohort_variability/occipitalmask_sigma1.0_r20.csv
python peri_histogram_bootstrap_analysis.py ../results/across_cohort_variability/xormask_sigma1.0_20.csv

By default, this will draw 1000 bootstrap samples and visualise the corresponding histogram. Here is such a histogram (calculated based on the occipital-temporal mask), with the original value of the parameter being highlighted as a dotted vertical line:

The command-line output will show the numbers that we report in the paper. For example, the analysis based on the occipital-temporal mask will yield the following output:

python peri_histogram_bootstrap_analysis.py -b 1000 ../results/across_cohort_variability/occipitalmask_sigma1.0_r20.csv
Bootstrap: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:15<00:00, 63.93it/s]
Theta_0: -0.0226
ASL (left) : 0.95
ASL (right): 0.04

Hence, the parameter has an achieved significance level that is significant at the 0.05 level.

Additional code

The repository contains more code that we are unable to fully discuss or acknowledge in the paper because of space reasons. Please feel free to browse around; we documented most of the relevant scripts, making it possible to get a glimpse into the full pipeline.

License

The code pertaining to this project is released under a BSD-3-Clause License. This license does not apply to the data itself, which is released under the PDDL.

BorgwardtLab/fMRI_Cubical_Persistence