Trajectorama is an algorithm that implements coexpression-based integration of multi-study single-cell trajectories. Trajectorama is described in the paper "Coexpression enables multi-study cellular trajectories of development and disease" by Brian Hie, Hyunghoon Cho, Bryan Bryson, and Bonnie Berger.
The most import dependency is on a custom implementation of Louvain clustering, which can be installed with the below commands:
# Binary dependency (for Ubuntu/Debian).
sudo apt-get install bison flex
git clone https://github.com/brianhie/louvain-igraph
cd louvain-igraph
python setup.py install
Troubleshooting: To avoid conflicts with the main version of louvain-igraph
, install the custom Louvain package first in a new conda environment prior to installation of any downstream dependencies (like scanpy
and trajectorama
).
Installing Trajectorama can then be done by:
python -m pip install trajectorama
We provide a basic API around the core algorithm that takes an expression matrix augmented with study information and returns a list of coexpression matrices, with corresponding indices into the original data:
import trajectorama
X = [ ... ] # Sample-by-gene expression matrix.
studies = [ ... ] # Study identifiers, one for each row of `X`.
Xs_coexpr, sample_idxs = trajectorama.transform(
X, studies,
corr_cutoff=0.7,
corr_method='spearman',
cluster_method='louvain',
min_cluster_samples=500,
)
The coexpression matrix Xs_coexpr[i]
is defined over the subset of cells X[sample_idxs[i], :]
. See the documentation string under the transform()
function at the top of trajectorama/trajectorama.py for the full list of parameters and default values.
This list of coexpression matrices can then be used in further analysis, e.g., you can flatten the matrices and use Scanpy to visualize the matrices as a KNN graph based on distance in coexpression space:
from anndata import AnnData
import numpy as np
import scanpy as sc
from scipy.sparse import csr_matrix
# Save upper triangle and flatten.
n_features = X.shape[1]
triu_idx = np.triu_indices(n_features) # Indices of upper triangle.
X_coexpr = np.concatenate([
X_coexpr_i[triu_idx].flatten() for X_coexpr_i in X_coexprs
])
X_coexpr = csr_matrix(X_coexpr)
# Plot KNN graph in coexpression space.
adata = AnnData(X_coexpr)
sc.pp.neighbors(adata)
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata)
The example scripts below show more detailed usage of Trajectorama, which was used to generate the paper results.
Trajectorama analyzes five large-scale studies of mouse neurons over multiple points in development.
Data can be found at http://trajectorama.csail.mit.edu/data.tar.gz and can be downloaded as:
wget http://trajectorama.csail.mit.edu/data.tar.gz
tar xvf data.tar.gz
To preprocess the data, run the command:
python bin/process.py conf/mouse_develop.txt
This preprocessing step only needs to be done once. Then, we perform panclustering and coexpression matrix computation using the command:
python bin/mouse_develop.py > mouse_develop.log
This will save each coexpression matrix as a .npz
file to a directory under target/sparse_correlations/
. Computing all coexpression matrices should complete in around an hour when running on a single core.
The downstream analysis can then be performed on these cached matrices using the commands:
python bin/mouse_develop_cached.py >> mouse_develop.log
python bin/mouse_develop_dictionary.py >> mouse_develop.log
This will log some relevant statistics and save visualizations under the figures/
directory.
We can perform a similar workflow for human hematopoiesis by running the commands:
# Download (if not done so for mouse data).
wget http://trajectorama.csail.mit.edu/data.tar.gz
tar xvf data.tar.gz
# Preprocess.
python bin/process.py conf/hematopoiesis.txt
# Analyze.
python bin/hematopoiesis.py > hematopoiesis.log
python bin/hematopoiesis_cached.py >> hematopoiesis.log
python bin/hematopoiesis_dictionary.py >> hematopoiesis.log
We can perform a similar workflow for mouse and human microglia in various conditions by running the commands:
# Download (if not done so for mouse data).
wget http://trajectorama.csail.mit.edu/data.tar.gz
tar xvf data.tar.gz
# Preprocess.
python bin/process.py conf/microglia.txt
# Analyze.
python bin/microglia.py > microglia.log
python bin/microglia_cached.py >> microglia.log
Create an issue in the repository or contact brianhie@mit.edu for any pertinent questions or concerns. We will do our best to answer promptly and feel free to create a pull request and contribute!