/exoseq

Exome Sequencing analysis pipeline for the Losic Group at Mt Sinai

Primary LanguageNextflowMIT LicenseMIT

LosicLab/exoseq

Introduction

LosicLab/ExoSeq is a bioinformatics analysis pipeline that performs best-practice analysis pipeline for Exome Sequencing data. It is forked from nfcore/ExoSeq.

The pipeline is built based on GATK best practices using Nextflow, a bioinformatics workflow tool. The main steps done by pipeline are the following (more information about the processes can be found here).

  • Alignment - bwa
  • Marking Duplicates - picard
  • Recalibration - gatk 4
  • Realignment - gatk 4
  • Variant Calling (Somatic or SNP) - gatk 4
  • Variant Filtration - gatk 4

Documentation

The LosicLab pipeline comes with the documentation forked from the original nf-core repository, found in the docs/ directory:

  1. Pipeline installation and configuration instructions
  2. Pipeline configuration
  3. Running the pipeline
  4. Output and how to interpret the results
  5. Troubleshooting

The pipeline now also has support for the MSSM Minerva HPC. Example run scripts can be found in the scripts/run_scripts folder.

Credits

The original nf-core/exoseq pipeline was initally developed by Senthilkumar Panneerselvam (@senthil10) with a little help from Phil Ewels (@ewels) at the National Genomics Infrastructure, part of SciLifeLab in Stockholm and has been extended by Alex Peltzer (@apeltzer), Marie Gauder (@mgauder) from QBIC Tuebingen/Germany as well as Marc Hoeppner (@marchoeppner) from IKMB Kiel/Germany.

Many thanks also to others who have helped out along the way too, including @pditommaso, @colindaven.