TranSpaAnalysis

Analysis notebooks reported in Manuscript:

Reliable imputation of spatial transcriptome with uncertainty estimation and spatial regularization

For replication please follow the following steps:

0. Preparation

0.1 Install required packages

Please refer to repo Transpa for installing tranSpa,
We have SpaGE included in this repo, but Tangram, stPlus, and SparkX should be installed following their software documents:
Note that SparkX is an R package, make sure you change the R software bin path in line !source /home/cqiao/.bashrc; ~/R/bin/Rscript sparkX.r intestine5fold to your working R environment when executing the following notebooks:
- imputation_base_intestine.ipynb
- imputation_base_melanoma.ipynb
- imputation_base_breastcancer.ipynb
- imputation_base_mouseliver.ipynb
scvelo is required for RNA Velocity analysis

0.2 Get raw data

All the ST and SC datasets can be downloaded from Zenodo;
the raw data are compressed as data.tar.gz

after downloading data.tar.gz (may take a while), extract files using the following command (linux):

tar -xvzf data.tar.gz

If you do not want to change the file paths in the codes, please organize the project directory as:

RootFolder
    |-- analysis
    |        |-- TranSpaAnalysis
    |                    |- README.md
    |                    |- preprocess_datasets.sh
    |                    |  ... 
    |-- data
    |-- output

It is also possible to have a quick inspection for the reported plots by running plotting notebooks with our generated data, output.tar.gz, please extract it to the specified location above.

0.3 Run preprocessing

Run preprocessing for all the ST and SC datasets, so that our codes and notebooks can run more efficiently.

sh ./preprocess_datasets.sh

This will run all the preprocess_*.py scripts. The preprocessed data would be stored in ../../output/preprocessed_dataset (relative location based on the default project organization shown above).

The intestine ST data seemed to be corrupted, please find the re-uploaded version intest_ST.tar.gz and extract it to replace ../../data/ST/intest/A1.h5ad

Note for the human breastcancer dataset processed by preprocess_breastcancer.py, because the genes of reference single cell data are annotated with gene ids, to convert the ids back to gene names, pyensemble is required and GRCh38 version 108 should be downloaded before executing the preprocessing script. Below is the code to download GRCh38 v108:

pip install pyensembl
export PYENSEMBL_CACHE_DIR=./
pyensembl install --release 108 --species human

This will download GRCh38 v108 to ./pyensembl. Then, in preprocess_breastcancer.py we need to set pyensembl cache directory to ./ under which the downloaded data locate. This is achieved via the line os.environ['PYENSEMBL_CACHE_DIR'] = './' in preprocess_breastcancer.py

For easier access, users can use our preprocessed results uploaded to output.tar.gz

1. Run Experiments:

1.1 Executing ST imputation notebooks:

imputation_base_seqfish_singlecell.ipynb
imputation_base_osmFISH_AllenVISp.ipynb
imputation_base_merfish_moffit.ipynb
imputation_base_starmap_AllenVISp.ipynb

1.2 Results plotting:

plotting_merfish.ipynb
plotting_osmfish.ipynb
plotting_seqfish.ipynb
plotting_starmap.ipynb

1.3 Run efficiency benchmarking

sh ./eval.sh

1.4 Summary plotting

plotting.ipynb

1.5 Explore imputed non-probed genes on SeqFISH:

seqfish_exploration.ipynb

1.6 Executing imputation & downstream analysis on Visium datasets

Please remember to change the R bin path in the line !source /home/cqiao/.bashrc; ~/R/bin/Rscript sparkX.r intestine5fold to your working R environment.

imputation_base_intestine.ipynb
imputation_base_melanoma.ipynb
imputation_base_breastcancer.ipynb
imputation_base_mouseliver.ipynb

1.7 Results plotting for Visium experiments:

visium_eval.ipynb

1.8 ST RNA velocity estimation:

Please install scvelo before running the notebooks.

transvelo_chickenheart.ipynb
transvelo_mousebrain.ipynb

qiaochen/TranSpaAnalysis