Recent advances in spatial sequencing technologies enable simultaneous capture of spatial location and chromatin accessibility of cells within intact tissue slices. Identifying peaks that display spatial variation and cellular heterogeneity is the first and key analytic task for characterizing the spatial chromatin accessibility landscape of complex tissues. Here we propose an efficient and iterative model, Descartes, for spatial variable peaks identification based on the graph of inter-cellular correlations. Through comprehensive benchmarking on 16 tissue slices from 4 published datasets, we demonstrate the superiority of Descartes in accurately identifying spatial variable peaks from three perspectives, that is facilitating clustering performance, capturing domain-specific signals and maintaining spatial continuity. In terms of computational efficiency, Descartes also outperforms existing methods with spatial assumptions. Utilizing the graph of inter-cellular correlations, Descartes denoises and imputes data via the neighboring relationships, enhancing the precision of downstream analysis. We further demonstrate the ability of Descartes for peak module identification by using peak-peak correlations within the graph. When applied to spatial multi-omics data, Descartes show its potential to detect gene-peak interactions, offering valuable insights into the construction of gene regulatory networks.
-
We recommend you to build a python virtual environment with Anaconda. If Anaconda (or miniconda) is already installed with Python3, skip to 2.
-
Create and activate a new virtual environment:
$ conda create -n descartes python=3.8
$ conda activate descartes
Python packages required by Cofea are listed below:
1. Python 3.8.18
2. Packages for Descartes and tutorial
anndata >= 0.9.2
matplotlib >= 3.7.4
numpy >= 1.22.4
pandas >= 1.4.3
scanpy == 1.9.6
scikit-learn >= 1.3.0
scipy >= 1.8.0
seaborn >= 0.12.2
Install the package and other requirements:
Package installation:
$ git clone https://github.com/likeyi19/Descartes
$ cd Descartes
$ pip install -r requirements.txt
Install descartes:
$ pip install decare
We provide a slice of mouse brain as a sample dataset, which can be downloaded at: https://cloud.tsinghua.edu.cn/d/71c9840593464e9fa122/
We provide four Jupyter notebooks to demonstrate the functions of our methods, including SV peaks selection, Data imputation, Peak module identification, and Gene-peak interaction detection.
Sixteen parameters are necessary, including the path of dataset, the save path for results, the chosen number of peaks, the random seed, the TF-IDF computation method, the number of principal components (PC), the quantity of K means, the similarity calculation method, the iteration count, the spatial neighborhood selection approach, the number of neighbors, the spatial strategy for score calculation, the peak filtering method, the quantity of peak filtering, the distance calculation method, and the data synthesis ratio.
For exsample:
$ cd code/
$ python descartes.py -fp ../data/scanpy.h5ad -sp ../result -n 10000 -sb 1 -pc 10 -k 20 -iter 4 -nb 5 -r 0.4
$ cd ..
Or you can get help in this way:
$ python code/descartes.py -h
usage: descartes.py [-h] [-fp FILE_PATH] [-sp SAVE_PATH] [-n NUM_SELECT_PEAK]
[-sb SEED_BASE] [-tf TF_IDF] [-pc PC_NUMBER] [-k K_NUMBER]
[-s SIMILARITY] [-iter ITER_TIME] [-spm SP_METHOD]
[-nb NEIGHBOR] [-spd SP_DIST] [-ps PRE_SELECT]
[-pn PEAKS_NUM] [-d DISTANCE] [-r RATIO]
optional arguments:
-h, --help show this help message and exit
-fp FILE_PATH, --file_path FILE_PATH
The path of dataset
-sp SAVE_PATH, --save_path SAVE_PATH
The save path for results
-n NUM_SELECT_PEAK, --num_select_peak NUM_SELECT_PEAK
The chosen number of peaks, defaults to 10000
-sb SEED_BASE, --seed_base SEED_BASE
The random seed
-tf TF_IDF, --TF_IDF TF_IDF
The TF-IDF computation method
-pc PC_NUMBER, --pc_number PC_NUMBER
The number of principal components
-k K_NUMBER, --k_number K_NUMBER
The quantity of K means
-s SIMILARITY, --similarity SIMILARITY
The similarity calculation method
-iter ITER_TIME, --iter_time ITER_TIME
The iteration count, defaults to 4
-spm SP_METHOD, --sp_method SP_METHOD
The spatial neighborhood selection approach
-nb NEIGHBOR, --neighbor NEIGHBOR
The number of neighbors
-spd SP_DIST, --sp_dist SP_DIST
The spatial strategy for score calculation
-ps PRE_SELECT, --pre_select PRE_SELECT
Peak filtering method
-pn PEAKS_NUM, --peaks_num PEAKS_NUM
The quantity of peak filtering
-d DISTANCE, --distance DISTANCE
The distance calculation method
-r RATIO, --ratio RATIO
Data synthesis ratio
If you have any questions, you can contact me from the email: lky23@mails.tsinghua.edu.cn