A Snakemake workflow for using PolyA_DB and UCSC Liftover with Cellranger.
Some genes are not accurately annotated in the reference genome. Here, we use information provide by the PolyA_DB v3.2 to update the coordinates, then the USCS Liftover tool to update to a more recent genome. Next, we use Cellranger to create the reference and count matrix. Finally, by taking advantage of the integrated Conda and Singularity support, we can run the whole thing in an isolated environment.
Our pipeline is available on Github (see below!), on the Snakemake Workflow Catalogue, and on WorkflowHub.
A full walktrhough on how to install and use this pipeline can be found here.
To take advantage of Singularity, you'll need to install it separately. If you are running on a Linux system, then Singularity can be installed from conda like so:
conda install -n snakemake -c conda-forge singularity
It's a bit more challenging for other operating systems. Your best bet is to follow their instructions here. But don't worry! Singularity is not required! Snakemake will still run each step in its own Conda environment, it just won't put each Conda environment in a container.
Navigate to our release page on github and download the most recent version.
Alternatively, for the bleeding edge, please clone the repo like so:
git clone https://github.com/IMS-Bio2Core-Facility/polya_liftover
⚠️ Heads Up! The bleeding edge may not be stable, as it contains all active development.
This pipeline expects de-multiplexed fastq.gz files,
normally produced by some deriviative of bcl2fastq
after sequencing.
They can (technically) be placed anywhere,
but we recommend creating a data
directory in your project for them.
The analysis pipeline was run using Snakemake v6.11.1.
The full version and software lists can be found under the relevant yaml files in workflow/envs
.
The all reasonable efforts have been made to ensure that the repository adheres to the best practices
outlined here.
For a full discussion on the analysis methods, please see the technical documentation.
Briefly, gene coordinates were updated with PolyA_DB version 3, converted to more recent builds with Liftover, and referenced/counted with Cellranger.
Reproducible results are the cornerstone of the scientific process.
By running the pipeline with snakemake
in a singularity
/docker
image
using conda
environments,
we can pin all software versions,
maximising reproducibility.
We also strive to make this pipeline as FAIR/O compliant as possible. In addition to the usual availability on Github, it is available at both the Snakemake Workflow Catalogue and WorkflowHub.