/STAMarker

STAMarker: Identify Spatial Domain-specific Variable Genes via Ensemble Graph Attention Autoencoders

Primary LanguageJupyter Notebook

STAMarker: Determining spatial domain-specific variable genes with saliency maps in deep learning

Overview

STAMarker is a three-stage framework that consists of an ensemble of graph attention autoencoders (STAGATE), an ensemble of MLP classifiers and saliency maps computation by the backpropagated gradient.

Given the spatial transcriptomics of a tissue section, STAMarker first trains an ensemble of graph attention auto-encoders that consists of M STAGATE models to learn the low-dimensional latent embeddings of spots, cluster them to obtain M grouping results, computes the clustering connectivity matrix and applies hierarchical clustering to obtain the spatial domains. STAMarker further models the relationships between the embeddings of the M auto-encoders and the spatial domains by training M base classifiers. At last, STAMarker computes the saliency map by first stacking the encoder and the corresponding classifier and then backpropagating the gradient to the input spatial transcriptomics matrix. STAMarker selects the domain-specific SVGs based on the genes’ saliency scores in each spatial domain. Framework of STAMarker

Step by step installation

Please refer to the step-by-step guide for installation.

Usage

The pipeline of STAMarker is wrapped in the core class STAMarker.

from stamarker.dataset import SpatialDataModule
from stamarker.pipeline import STAMarker, make_spatial_data
from stamarker.utils import select_svgs
# Make data module from `ann_data` where `ann_data` is an scanpy `AnnData` object of spatial transcriptomic dataset.
data_module = make_spatial_data(ann_data)
# Prepare data: filter out spots and genes and initialize the spot-spot graph
data_module.prepare_data(rad_cutoff=40, n_top_genes=3000, min_counts=20)

Now we are ready to run STAMarker stages by stages.

# Initialize the `STAMarker` class
n_auto_encoders = 20
stamarker = STAMarker(n_auto_encoders, "output_folder", config)
# Stage1: train the graph attention auto-encoders and save the model to `output_folder`
stamarker.train_auto_encoders(data_module)
# Stage2: cluster the spots by the learned emebeddings and perform consensus clustering 
n_class = 5
resolution = 0.6
stamarker.clustering(data_module, "louvain", resolution)
stamarker.consensus_clustering(n_class)
# Stage3: train the classifiers based on the consensus labels
stamarker.train_classifiers(data_module, n_class)

We can get the saliency maps by the following code

smaps = stamarker.compute_smaps(data_module)
consensus_labels = np.load(stamarker.save_dir + "/consensus_labels.npy")4

Select the spatial domain-specific SVGs of spatial domain 0 by one line.

svg_domain0 = select_svgs(smaps, 0, consensus_labels, alpha=1.5)

Trained models

The trained models are availabe at here.

Tutorial

Please also refer to the configuration yaml file description.

News

Citation

@article{Zhang2023,
    author = {Zhang, Chihao and Dong, Kangning and Aihara, Kazuyuki and Chen, Luonan and Zhang, Shihua},
    title = {STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning},
    journal = {Nucleic Acids Research},
    pages = {gkad801},
    year = {2023},
    month = {10},
    issn = {0305-1048},
    doi = {10.1093/nar/gkad801},
}

Contact

If you have any problem regarding STAMarker, please contact zhangchihao11@outlook.com.