monocle-scripts

Command line scripts wrapping functions in Monocle3

1. Installation

1.1. Install Monocle3

a. Manually

Create and activate a conda environment with r-base and python, then follow monocle3 installation guide.

b. Conda-assisted

Use provided install_monocle3.sh which uses conda to install most of the dependencies.

1.2. Install monocle-scripts

$ conda activate <monocle3-env>
$ Rscript -e 'devtools::install_github("ebi-gene-expression-group/monocle-scripts")'

The installed executable script is at $CONDA_PREFIX/<monocle3-env>/lib/R/library/MonocleScripts/exec/monocle3.

1.3. Test installation

This test requires bats available in the same conda environment as monocle-scripts. Run:

$ wget 'https://github.com/ebi-gene-expression-group/monocle-scripts/raw/develop/monocle-scripts-post-install-tests.bats'
$ chmod +x monocle-scripts-post-install-tests.bats
$ ./monocle-scripts-post-install-tests.bats

2. Usage

Currently only covers steps introduced in Monocle3 documentation: Constructing single-cell trajectories

Usage: monocle3 [-h] <command> ...

Commands:
  create            Creation of Monocle 3 object from expression and metadata.
  preprocess        Normalisation, scaling, initial dimension reduction.
  reduceDim         Reduce dimensionality by UMAP.
  partition         Partition cells into groups.
  learnGraph        Learn trajectories.
  orderCells        Adjust the start of pseudo-time
  diffExp           Identify genes with varing expression along trajectories.
  plotCells         Visualise trajectories.

Options:
  -h, --help        Show this help message and exit

2.1. Create CDS object

Usage: monocle3 create [options] <output_object>

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--expression-matrix=STR
		Expression matrix, genes as rows, cells as columns. Required input. Provide as TSV, CSV, RDS or MTX.

	--cell-metadata=STR
		Per-cell annotation, optional. Row names must match the column names of the expression matrix. Provide as TSV, CSV or RDS.

	--gene-annotation=STR
		Per-gene annotation, optional. Row names must match the row names of the expression matrix. Provide as TSV, CSV or RDS.

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.2. Preprocess

Usage: monocle3 preprocess [options] <input_object> <output_object>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--method=STR
		The initial dimension method to use, choose from {PCA, LSI}. [Default: PCA]

	--num-dim=INT
		The dimensionality of the reduced space. [Default: 50]

	--norm-method=STR
		Determines how to transform expression values prior to reducing dimensionality, choose from {log, size_only}. [Default: log]

	--use-genes=STR
		Manually subset the gene pool to these genes for dimensionality reduction, NULL to skip. [Default: NULL]

	--residual-model-formula-str=STR
		A string model formula specifying effects to subtract from the data, NULL to skip. [Default: NULL]

	--pseudo-count=FLOAT
		Amount to increase expression values before dimensionality reduction. [Default: 1]

	--no-scaling
		When this option is NOT set, scale each gene before running trajectory reconstruction.

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.3. Reduce dimensionality

Usage: monocle3 reduceDim [options] <input_object> <output_object>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--max-components=INT
		The dimensionality of the reduced space. [Default 2]

	--reduction-method=STR
		The algorithm to use for dimensionality reduction, choose from {UMAP, tSNE, PCA, LSI}. [Default: UMAP]

	--preprocess-method=STR
		The preprocessing method used on the data, choose from {PCA, LSI}. [Default: PCA]

	--cores=CORES
		The number of cores to be used for dimensionality reduction. [Default: 1]

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.4. Partition cells

Usage: monocle3 partition [options] <input_object> <output_object>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--reduction-method=STR
		The dimensionality reduction to base the clustering on, choose from {UMAP, tSNE, PCA, LSI}. [Default: UMAP]

	--knn=INT
		Number of nearest neighbours used for Louvain clustering. [Default: 20]

	--weight
		When this option is set, calculate the weight for each edge in the kNN graph.

	--louvain-iter=INT
		The number of iteration for Louvain clustering. [Default: 1]

	--resolution=FLOAT
		Resolution of clustering result, specifying the granularity of clusters. Not used by default and the standard igraph louvain clustering algorithm will be used.

	--partition-qval=FLOAT
		The q-value threshold used to determine the partition of cells. [Default: 0.05]

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.5. Learn graph

Usage: monocle3 learnGraph [options] <input_object> <output_object>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--no-partition
		When this option is set, learn a single tree structure for all the partitions. If not set, use the partitions calculated when clustering and identify disjoint graphs in each.

	--no-close-loop
		When this option is set, skip the additional run of loop closing after computing the principal graphs to identify potential loop structure in the data space.

	--euclidean-distance-ratio=FLOAT
		The maximal ratio between the euclidean distance of two tip nodes in the spanning tree inferred from SimplePPT algorithm and that of the maximum distance between any connecting points on the spanning tree allowed to be connected during the loop closure procedure. [Default: 1]

	--geodesic-distance-ratio=FLOAT
		The minimal ratio between the geodestic distance of two tip nodes in the spanning tree inferred from SimplePPT algorithm and that of the length of the diameter path on the spanning tree allowed to be connected during the loop closure procedure. (Both euclidean_distance_ratio and geodestic_distance_ratio need to be satisfied to introduce the edge for loop closure.)

	--no-prune-graph
		When this option is set, perform an additional run of graph pruning to remove smaller insignificant branches.

	--minimal-branch-len=INT
		The minimal length of the diameter path for a branch to be preserved during graph pruning procedure. [Default: 10]

	--orthogonal-proj-tip
		When this option is set, perform orthogonal projection for cells corresponding to the tip principal points.

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.6. Order cells along trajectories

Usage: monocle3 orderCells [options] <input_object> <output_object>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_object>:
		Output object, can be SingleCellExperiment(sce), Seurat object(seurat), or CellDataSet V3(cds3). Only cds3 is supported currently.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-object-format=STR
		Format of output object. [Default: cds3]

	-I, --introspective
		Print introspective information of the output object.

	--root-pr-nodes=STR
		The starting principal points. We learn a principal graph that passes
		through the middle of the data points and use it to represent the
		developmental process. Exclusive with --root-cells.

	--root-cells=STR
		The starting cells. Each cell corresponds to a principal point and multiple cells can correspond to the same principal point. Exclusive with --root-pr-nodes.

	--cell-phenotype=STR
		The cell phenotype (column in pdata) used to identify root principal nodes.

	--root-type=STR
		The value of the phenotype specified by "--cell-pheontype" that defines cells root principal nodes.

	--reduction-method=STR
		The dimensionality reduction that was used for clustering, choose from {UMAP, tSNE, PCA, LSI}. [Default: UMAP]

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.7. Test of differential expression over a trajectory

Usage: monocle3 diffExp [options] <input_object> <output_table>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_table>:
		Output table file name.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-table-format=STR
		Format of output table, choose from {tsv, csv}. [Default: tsv]

	-I, --introspective
		Print introspective information of the output table.

	--neighbor-graph=STR
		What neighbor graph to use, "principal_graph" recommended for trajectory analysis, choose from {principal_graph, knn}. [Default: knn]

	--reduction-method=STR
		The dimensionality reduction to base the clustering on, choose from {UMAP}. [Default: UMAP]

	--knn=KNN
		Number of nearest neighbors used for building the kNN graph which is passed to knn2nb function during the Moran's I (Geary's C) test procedure.

	--method=METHOD
		A character string specifying the method for detecting significant genes showing correlated expression along the principal graph embedded in the low dimensional space, choose from {Moran_I}. [Default: Moran_I]

	--alternative=ALTERNATIVE
		A character string specifying the alternative hypothesis, choose from {greater, less, two.sided}. [Default: greater]

	--cores=CORES
		The number of cores to be used while testing each gene for differential expression. [Default: 1]

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

2.7. Plot trajectories

Usage: monocle3 plotCells [options] <input_object> <output_plot>

<input_object>:
		Input object, can be SingleCellExperiment(sce), Seurat object(seurat), CellDataSet V2(cds2) or V3(cds3). Only cds3 is supported currently.

<output_plot>:
		Output plot file name.

Options:
	-f STR, --input-object-format=STR
		Format of input object. [Default: cds3]

	-F STR, --output-plot-format=STR
		Format of output plot, choose from {png, pdf}. [Default: png]

	--xdim=XDIM
		The column of reducedDimS(cds) to plot on the horizontal axis. [Default: 1]

	--ydim=YDIM
		The column of reducedDimS(cds) to plot on the vertical axis. [Default: 2]

	--reduction-method=STR
		The dimensionality reduction for plotting, choose from {UMAP, tSNE, PCA, LSI}. [Default: UMAP]

	--color-cells-by=COLOR-CELLS-BY
		The cell attribute (e.g. the column of pData(cds)) to map to each cell's color, or one of {clusters, partitions, pseudotime}. [Default: pseudotime]

	--genes=STR
		A list of gene IDs/short names to plot, one per panel.

	--norm-method=STR
		Determines how to transform expression values for plotting, choose from {log, size_only}. [Default: log]

	--cell-size=CELL-SIZE
		The size of the point for each cell. [Default: 1.5]

	--alpha=ALPHA
		The alpha aesthetics for the original cell points, useful to highlight the learned principal graph.

	--label-cell-groups
		If set, display the cell group names directly on the plot. Otherwise include a color legend on the side of the plot.

	--no-trajectory-graph
		When this option is set, skip displaying the trajectory graph inferred by learn_graph().

	--label-groups-by-cluster
		If set, and setting --color-cells-by to something other than cluster, label the cells of each cluster independently. Can result in duplicate labels being present in the manifold.

	--label-leaves
		If set, label the leaves of the principal graph.

	--label-roots
		If set, label the roots of the principal graph.

	--label-branch-points
		If set, label the branch points of the principal graph.

	-v, --verbose
		Emit verbose output.

	-h, --help
		Show this help message and exit

3. Limitation and known issues

Only support reading/writing CDS version3 object