umap_projection

This code enables the projection of single-cell RNA-seq profiles from one dataset into the UMAP embedding coordinates of a different dataset using Spearman's correlation as a similarity measure. Spearman's correlation is particularly useful in this context because 1) the cluster_diffex pipeline uses Spearman's correlation as a similarity metric and 2) the non-parametric nature of Spearman's correlation allows projection of scRNA-seq data generated by completely different methods from that used in the original embedding. For example, one could project SMART-seq data (e.g. TPM data) onto a UMAP embedding generated using 10x Genomics Chromium data (e.g. molecular counting data). This repository includes code for computing the transformation and generating simple figures.

Requirements:

  1. Python 3.6 or higher
  2. Numpy
  3. Scikit-learn
  4. UMAP (https://github.com/lmcinnes/umap)
  5. Scipy
  6. Numba
  7. Seaborn

Suggested usage:

  1. Install dependencies.

  2. Clone this repository.

  3. Run umap_transform.py. Example usage:

python umap_transform.py -rm REFDATA/REFDATA.matrix.txt -pm QUERY1/QUERY1.matrix.txt QUERY2/QUERY2.matrix.txt -p project_to_REFDATA/project_to_REFDATA --markers markers.txt -k 5

where REFDATA.matrix.txt is a tab-delimited matrix of molecular counts for the reference (first two columns contain GIDS and gene symbols, subsequent column contain counts for each cell), QUERYX.matrix.txt is a matrix of molecular counts for query sample X (same format as REFDATA.matrix.txt), markers.txt is a one-column list of GIDS for computing similarity (usually highly variable genes). There should be no header in any of the files.