nextflow pipeline for running cellranger
You must have nextflow and cellranger installed and in your path. These are both very easy to install. Nextflow is a single command to download and install, and cellranger is distributed as a tarball full of binaries, but you have to agree to the license on their website and give them your information to get a link.
This pipeline now supports using CellBender to filter the CellRanger output. If you would like to use this functionality, you will need to have CellBender installed in a conda environment.
There is a sample configuration file that works for running this pipeline on
Lewis included in this package (nextflow.config
). If you're running this on
Lewis, you can try using the default configuration file by not doing anything.
Otherwise, copy it to the directory where you are going to run this pipeline,
and edit it to work on your cluster or other setup. See nextflow
documentation for help with this.
CellBender is multiple orders of magnitude faster on a GPU than a CPU, so running it on a machine with a GPU is highly recommended. The default configuration file provides an example of how I have this configured on the Lewis cluster, and should be reasonably simple to adapt to your cluster.
You'll need to make or download a cellranger reference first. This pipeline does not do that for you. See cellranger documentation.
First, gather up your reference and your fastq files from the sequencer.
Next, make a sample sheet listing your samples. This will be in csv format with
a header. The only required column is sample_id
, which gives the prefix for
that library in the fastq files. For example, if these are your files:
HCC1_S5_R1_001.fastq.gz
HCC1_S5_R2_001.fastq.gz
HCC2_S5_R1_001.fastq.gz
HCC2_S5_R2_001.fastq.gz
then you have two libraries, HCC1 and HCC2, and you should have two lines (plus a header!) in your csv file, withsample_id
s HCC1 and HCC2. You can also add other columns to the csv file to use later, like sample treatment. Like so:
sample_id,treatment
HCC1,control
HCC2,treated
Then, it's a single command:
nextflow run WarrenLab/cellranger-nf -latest \
--fastqs_dir /path/to/directory/containing/fastqs/ \
--ref_dir /path/to/directory/containing/cellranger/reference/ \
--sampleSheet samples.csv \
--nuclei # only if this is a single nucleus rather than single cell library \
--cellbender # only if you want to run CellBender filtering
Your output will appear in the current directory if everything works right.
If you did not specify the --cellbender
option, this pipeline runs
cellranger aggr
to aggregate the results from the different libraries. You
can take the aggregated output and load it in Seurat, scanpy, or the single-
cell processing platform of your choice.
If you did specify the --cellbender
option, the cellbender filtered output is
unfortunately not compatible with cellranger aggr
, so you will have to do the
aggregation yourself with the individual filtered library h5 files. The pipeline
adds some statistics to the final aggregation table that should help you do
this. My recommendation is to use these statistics to calculate the mean reads
per barcode for each library, and then regress this variable out. See
this github issue for more details.