CRUK tools

This repository provides some tools for processing genomics data on the CRUK Cambridge Institute SLURM server.

Alignment

solo_align.sh provides a script for aligning a single library (single-end or paired-end).
multi_align.sh is a convenience wrapper to submit alignment jobs for many libraries in a data set.
guess_encoding.py guesses the Phred encoding for the aligner.

Alignment is performed using the subread aligner. It also requires samtools and MarkDuplicates.

Read counting

counter.R provides a template for read counting to produce a gene-by-sample count matrix. It requires specification of the BAM files for which to perform the counting as well as a set of GTF annotation files. It will use the featureCounts function in the Rsubread package.

Data mangling

cram2fastq.sh will convert a CRAM file into FASTQ for entry into the alignment pipelines above.
sanger_dump.sh will convert an entire folder of CRAM files into FASTQs.

Other

cell_ranger.sh will call the CellRanger pipeline to create a count matrix for single-cell transcriptomics data from the 10X Genomics platform.

LTLA/CRUKtools

CRUK tools