Snakemake-exome is a snakemake workflow that generates alignments from exome sequencing data (or similar targeted DNA-sequencing data). The workflow is designed to handle paired-end (and optionally multi-lane) sequencing data. Processing of patient-derived xenograft (PDX) samples is also supported, by using disambiguate to separate graft/host sequence reads.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and its DOI (see above).
The standard (non-PDX) workflow essentially performs the following steps:
- The input reads are trimmed to remove adapters and/or poor quality base calls using cutadapt.
- The trimmed reads are aligned to the reference genome using bwa mem.
- The alignments are sorted and indexed using samtools.
- Bam files from multiple lanes are merged using samtools merge.
- Picard MarkDuplicates is used to remove optical/PCR duplicates.
- The final alignments are indexed using samtools index.
QC statistics are generated using fastqc, samtools stats and picard CollectHSMetrics (to assess bait coverage). The stats are summarized into a single report using multiqc.
This results in the following dependency graph:
The PDX workflow is a slightly modified version of the standard workflow, which aligns the reads to two reference genome (the host and graft reference genomes) and uses disambiguate to remove sequences originating from the host organism. See the docs for more details.
Documentation is available at jrderuiter.github.io/snakemake-exome.
This software is released under the MIT license.