CRUK tools
This repository provides some tools for processing genomics data on the CRUK Cambridge Institute SLURM server.
Alignment
solo_align.sh
provides a script for aligning a single library (single-end or paired-end).multi_align.sh
is a convenience wrapper to submit alignment jobs for many libraries in a data set.guess_encoding.py
guesses the Phred encoding for the aligner.
Alignment is performed using the subread aligner. It also requires samtools and MarkDuplicates.
Read counting
counter.R
provides a template for read counting to produce a gene-by-sample count matrix.
It requires specification of the BAM files for which to perform the counting as well as a set of GTF annotation files.
It will use the featureCounts
function in the Rsubread package.
Data mangling
cram2fastq.sh
will convert a CRAM file into FASTQ for entry into the alignment pipelines above.sanger_dump.sh
will convert an entire folder of CRAM files into FASTQs.
Other
cell_ranger.sh
will call the CellRanger pipeline to create a count matrix for single-cell transcriptomics data from the 10X Genomics platform.