/insct

scRNAseq integration with triplet neural networks

Primary LanguagePythonMIT LicenseMIT

insct ("Insight")

INtegration of millions of Single Cells using batch-aware Triplet networks

INSCT is a deep learning algorithm which calculates an integrated embedding for scRNA-seq data. With INSCT, you can:

  • Integrate scRNA-seq datasets across batches with/without labels.
  • Generate a low-dimensional representation of the scRNA-seq data.
  • Integrate of millions of cells on personal computers.

For more info check out our manuscript.

How does it work?

tnn

  1. INSCT learns a data representation, which integrates cells across batches. The goal of the network is to minimize the distance between Anchor and Positive while maximizing the distance between Anchor and Negative. Anchor and Positive pairs consist of transcriptionally similar cells from different batches. The Negative is a transcriptomically dissimilar cell sampled from the same batch as the Anchor.
  2. Principal components of three data points corresponding to Anchor, Positive and Negative are fed into three identical neural networks, which share weights. The triplet loss function is used to train the network weights and the two-dimensional embedding layer activations represent the integrated embedding.

To learn an integrated embedding that overcomes batch effects, INSCT samples triplets in a batch-aware manner:

tnn

What does it do?

For example, we simulated scRNAseq data, where batch effects dominate the embedding:

tnn

However, INSCT learns an integrated embedding where cells cluster by group instead of batch:

tnn

Check out our interactive tutorials!

The following notebooks can be run within your web browser and allow you to interactively explore tnn. We have prepared the following analysis examples:

  1. Simulation dataset
  2. Pancreas dataset

Notebooks to reproduce the analyses described in our preprint can be found in the reproducibility folder.

Installation

INSCT depends on the following Python packages. These need to be installed separately:

ivis==1.7.2
scanpy
hnswlib

To install INSCT, follow these instructions:

Github

Install directly from Github using pip:

pip install git+https://github.com/lkmklsmn/insct.git

Download the package from Github and install it locally:

git clone http://github.com/lkmklsmn/insct
cd insct
pip install .

Usage

Unsupervised model

Triplets sampled based on transcriptional similarity

  1. AnnData object with PCs
  2. Batch vector
from insct.tnn import TNN
model = TNN()
model.fit(X = adata, batch_name='batch')

Supervised model

Triplets sampled based on both transcriptional similarity and known labels

  1. AnnData object with PCs
  2. Batch vector
  3. Celltype vector
model = TNN()
model.fit(X = adata, batch_name='batch', celltype_name='Celltypes')

Semi-supervised model

Triplets sampled based on both transcriptional similarity and known labels

  1. AnnData object with PCs
  2. Batch vector
  3. Celltype vector
  4. Masking vector (which labels to ignore)
model = TNN()
model.fit(X = adata, batch_name='batch', celltype_name='Celltypes', mask_batch= batch_name)

Output

  1. Coordinates for the integrated embedding