A simple, interactive, and customizable single cell workflow manager

Core Concepts • Overview • Features • Usage • Upcoming Features • Quality Control Plotting

Core Concepts

Features

Usage

Install

pip install cellforest

Install Accompanying R Package

git clone https://github.com/TheAustinator/cellforest.git
!R -e "library('devtools'); library('parallel'); install('~/code/cellforest/cellforestR', dependencies = TRUE, Ncpus = detectCores())"```
**Import**
```python
from cellforest import CellBranch

Examples

Upcoming Features

Quality Control Plotting

Following the paradigm of tree of parameters, Cellforest implements automated generation of quality control (QC) plots after each process run. This means that a user can retroactively look up preliminary analyses, such as how the cells clustered, without having to run and re-run the pipeline on different parameters. Compared to ad hoc parameters picking (reactive) QC plots implementation pre-defines all plots on a wide range of parameters (proactive) which leads to drastic time savings for analyses requiring constant iteration of upstream parameters.

I. Example plots

Here is a pick of plots commonly used for scRNA-Seq, already implemented in Cellforest. For a full list, check out All implemented plots.

Plot definition and method	Description	Use case	Available and suggested `plot_kwargs`
Plot definition and method	Description	Use case	Available and suggested `plot_kwargs`	Plot config name: `_UMIS_VS_GENES_SCAT_` Method (use at or after "normalize"): `plot_umis_vs_genes_scat()	Scatter plot showing relationship between UMI and gene counts per cell.	Generally there should be a good correlation. Filter out damaged cells: based on low UMI, gene count and/or low UMI, moderate gene count (high mitochonrial genes percentage).	stratify: - none - sample_id plot_size: [800, 800] bins: 50 alpha: 0.4 All keyword arguments for pyplot.scatter()
Plot config name: `_HIGHEST_EXPRS_DENS_` Method (use at or after "normalize"): `plot_highest_exprs_dens()`	Dense plots showing distribution of UMI counts per cell in 50 highest expressing genes.	Determine main expressing genes to ensure that cells are filtered correctly and there are not many dead cells (e.g., mito genes as top expression genes) influencing the analysis.	stratify: - none - sample_id plot_size: [1600, 1600]
Plot config name: `_UMAP_EMBEDDINGS_SCAT_` Method (use at or after "reduce"): `plot_umap_embeddings_scat()`	Facet plot showing relationship between principal components in UMAP.	Examine sources of variance (donor-donor, lane-lane, timing, sample, etc.) and identify batch effects.	stratify: - none - sample_id - nFeature_RNA plot_size: [1600, 1600] alpha: 0.4 npcs: 2
Plot config name: `_PERC_RIBO_PER_CELL_VLN_` Method (use at or after "cluster") `plot_perc_ribo_per_cell_vln()`	Violin plots showing distribution of ribosomal genes percentages per cell, stratified by cluster.	TODO-QC: FILL IN HERE.	stratify: cluster plot_size: [1600, 800]

II. Quick specification

Plots declaration can done before the tree is run or after, with forcing generation of not-yet-created plots. Analogous to process run outputs, all plots are stored in _plots, inside the folders for corresponding process outputs. Now, we shall look at an example configuration for QC plotting:

plot_map:
  root:
    _UMIS_PER_BARCODE_RANK_CURV_: ~
  normalize:
    _GENES_PER_CELL_HIST_:
      plot_kwargs:
        stratify: 
          - sample_id
          - none
        plot_size: [800, 800]

This piece shall be located in default_config.yaml along with process specifications. 2nd level keys (root, normalize) indicate definition of plots at the corresponding process alias/name
Plot names are in the format of _<PLOT_NAME>_<PLOT_TYPE>_, for the full list of available plot names, refer to All umplemented plots.
For each plot we can specify parameters. For example, stratify groups the cells by a specified column in the metadata. In this case, there will be two plots created: first stratified by sample_id ID with generated plot size of 800x800 pixels and second plot on all data (no stratification) with size 800x800 pixels.
As soon as you initialize a branch (branch = cellforest.from_sample_metadata(root_dir, meta, branch_spec=branch_spec)) or run a process (e.g., branch.process.normalize()), specified plots will be generated immediately after process finishes running.
For advanced plotting specifications, refer to Parametrizing QC plotting

Troubleshooting

errors with cellforestR or with processes which contain R

Possible indicators -- mention of miniconda in error message
Solution -- ensure global environment variable RETICULATE_PYTHON is set to your python path (e.g. /usr/bin/python3)
- In R, can set via
```
Sys.setenv(RETICULATE_PYTHON = "/usr/bin/python3")
system("echo $RETICULATE_PYTHON")
library(reticulate)
```
- In shell, can be set via export RETICULATE_PYTHON=/usr/bin/python3 (may require RStudio restart if using)