This code enables the projection of single-cell RNA-seq profiles from one dataset into the UMAP embedding coordinates of a different dataset using Spearman's correlation as a similarity measure. Spearman's correlation is particularly useful in this context because 1) the cluster_diffex pipeline uses Spearman's correlation as a similarity metric and 2) the non-parametric nature of Spearman's correlation allows projection of scRNA-seq data generated by completely different methods from that used in the original embedding. For example, one could project SMART-seq data (e.g. TPM data) onto a UMAP embedding generated using 10x Genomics Chromium data (e.g. molecular counting data). This repository includes code for computing the transformation and generating simple figures.
Requirements:
- Python 3.6 or higher
- Numpy
- Scikit-learn
- UMAP (https://github.com/lmcinnes/umap)
- Scipy
- Numba
- Seaborn
Suggested usage:
-
Install dependencies.
-
Clone this repository.
-
Run umap_transform.py. Example usage:
python umap_transform.py -rm REFDATA/REFDATA.matrix.txt -pm QUERY1/QUERY1.matrix.txt QUERY2/QUERY2.matrix.txt -p project_to_REFDATA/project_to_REFDATA --markers markers.txt -k 5
where REFDATA.matrix.txt is a tab-delimited matrix of molecular counts for the reference (first two columns contain GIDS and gene symbols, subsequent column contain counts for each cell), QUERYX.matrix.txt is a matrix of molecular counts for query sample X (same format as REFDATA.matrix.txt), markers.txt is a one-column list of GIDS for computing similarity (usually highly variable genes). There should be no header in any of the files.