We propose xSiGra, an interpretable graph-based AI model, designed to elucidate interpretable features of identified spatial cell types, by harnessing multi-modal features from spatial imaging technologies. xSiGra employs hybrid graph transformer model to spatial cell type identification by constructing a spatial cellular graph with immunohistology images and gene expression as node attributes. xSiGra uses a novel variant of Grad-CAM component to uncover interpretable features, including pivotal genes and cells for various cell types facilitating deeper biological insights from spatial data.
xSiGra is built using pytorch Test on: Red Hat Enterprise Linux Server 7.9 (Maipo), NVIDIA Tesla V100 GPU, Intel(R) Xeon(R) CPU E5-2680 v3, 2.50GHZ, 12 core, 64 GB, python 3.10.9, R 4.2.1, CUDA environment(cuda 11.7)
- Requirements
- Installation
- Dataset
- Folder Structure
- Tutorial for xSiGra
- Pre-processing scripts
- Reproduction instructions
Required modules can be installed via requirements.txt under the project root
pip install -r requirements.txt
Check the list in following section:
Download xSiGra:
git clone https://github.com/asbudhkar/xSiGra
The dataset can be download here
The dataset can be download here
The dataset can be download here
The dataset can be download here
The dataset can be download here
Check the required folder structure in following section:
- Data processing: here
- Run xSiGra: here
- xSiGra cluster visualization: here
- Compute explanations: here
- Compute explanations detailed: here
- Visualize explanations: here
Go to /path/to/xSiGra/xSiGra_model
# for NanoString CosMx dataset
python3 processing.py --dataset nanostring
# for 10x Visium dataset
python3 processing.py --dataset 10x
Go to /path/to/xSiGra/xSiGra_model
Download the datasets and checkpoints and put in folders as above.
The results will be stored in "/path/xSiGra/cluster_results_gradcam_gt1"
python3 train_nanostring.py --test_only 1 --dataset lung13 --root ../dataset/nanostring/lung13 --save_path ../checkpoint/nanostring_train_lung13 --seed 1234 --epochs 200 --lr 1e-3 --num_fov 20 --device cuda:0
And you can use the bash script to test all slices:
sh test_nanostring.sh
The results will be stored in "/path/xSiGra/10x_results/"
python3 train.py --test_only 1 --lr $lr --epochs $epoch --id 151676 --seed $seed --repeat $repeat --ncluster 7 --save_path $sp --dataset 10x --cluster_method mclust --root ../dataset/10x/
And you can use the bash script to test all slices:
sh test_visium.sh
The results will be stored in "/path/xSiGra/10x_results/"
python3 train.py --test_only 1 --lr $lr --epochs $epoch --id 151676 --seed $seed --repeat $repeat --ncluster 7 --save_path $sp --dataset 10x --cluster_method mclust --root ../dataset/10x/
python3 train.py --dataset human_breast_cancer --test_only 1
python3 train.py --dataset mouse_brain_anterior --test_only 1
python3 train.py --dataset mouse_brain_coronal --test_only 1
Go to /path/to/xSiGra/xSiGra_model
Download the datasets and checkpoints and put in folders as above. Download the lung13_adata_pred.h5ad and put in saved_adata folder as above.
The results will be stored in "/path/xSiGra/cluster_results_gradcam_gt1"
python3 visualize_nanostring.py --test_only 1 --dataset lung13 --root ../dataset/nanostring/lung13 --save_path ../checkpoint/nanostring_train_lung13 --seed 1234 --epochs 200 --lr 1e-3 --num_fov 20 --device cuda:0
And you can use the bash script to test all slices:
sh visualize_nanostring.sh
python3 train_nanostring.py --dataset lung13 --root ../dataset/nanostring/lung13 --save_path ../checkpoint/nanostring_train_lung13 --seed 1234 --epochs 200 --lr 1e-3 --num_fov 20 --device cuda:0
python3 train.py --lr $lr --epochs $epoch --id 151676 --seed $seed --repeat $repeat --ncluster 7 --save_path $sp --dataset 10x --cluster_method mclust --root ../dataset/10x/
And you can use the bash script to train all slices:
sh train_visium.sh
python3 train.py --dataset human_breast_cancer
python3 train.py --dataset mouse_brain_anterior
python3 train.py --dataset mouse_brain_coronal
Go to /path/to/xSiGra/explanability_benchmarking
Download the lung13_adata_pred.h5ad and put in saved_adata as above or use the anndata computed from testing step above.
The results will be stored in "/path/to/xSiGra/cluster_results_{benchmark_name}_gt1"
python3 compute_explanations.py --test_only 1 --dataset lung13 --root ../dataset/nanostring/lung13 --save_path ../checkpoint/nanostring_train_lung13 --seed 1234 --epochs 200 --lr 1e-3 --num_fov 20 --device cuda:0
python3 explainability_benchmarks.py --benchmark deconvolution
And you can use the bash script to compute explanations for different benchmarks for lung13. Change the input parameters to compute explanations for other lung cancer slides
sh compute_explanations.sh
Go to /path/to/xSiGra/explanability_evaluation
Two metrics are used: Fidelity and Contrastivity
Download the lung13_adata_pred.h5ad and put in saved_adata as above or use the anndata computed from testing step above. The explanations for different benchmarks need to be computed first and stored in folder structure as explained in above step
The results for lung13 will be stored in "/path/to/xSiGra/explainability_evaluation/fidelity_results", "/path/to/xSiGra/explainability_evaluation/fidelity_plots", "/path/to/xSiGra/explainability_evaluation/contrastivity_plots"
# Fidelity
python3 compute_fidelity_all.py --benchmark deconvolution
python3 compute_fidelity_mask.py --benchmark deconvolution
# Contrastivity
python3 evaluate_contrastivity.py
And you can use the bash script to compute fidelity for all benchmarks for lung13. Change the input parameters to compute explanations for other lung cancer slides
sh compute_fidelity.sh
Go to /path/to/xSiGra/downstream_analysis
Gene enrichment analysis and cell-cell interaction analysis is performed
Download the lung13_adata_pred.h5ad and put in saved_adata as above or use the anndata computed from testing step above. The explanations for different benchmarks need to be computed first and stored in folder structure as explained in above step Download xSiGra explanations
The results for lung13 will be stored in "/path/to/xSiGra/downstream_analysis/enrichment_results". Change the input parameters to compute explanations for other lung cancer slides The results can be further analysed using softwares like GSEA (Gene Set Enrichment Analysis Workbench) or scanpy for further analysis to gain biological insights.
# Gene enrichment analysis
python3 gene_enrichment.py
# Cell-Cell interaction analysis
python3 lr_significant.py
python3 lr_significant1.py
python3 lr_significant2.py
python3 lr_significant3.py
python3 test_significant_pairs.py
Please cite our paper if you use this code in your own work
Aishwarya Budhkar, Ziyang Tang, Xiang Liu, Xuhong Zhang, Jing Su, Qianqian Song, xSiGra: explainable model for single-cell spatial data elucidation, Briefings in Bioinformatics, Volume 25, Issue 5, September 2024, bbae388, https://doi.org/10.1093/bib/bbae388