/megLR

Snakemake workflow for comprehensive ONT analysis

Primary LanguagePython

Analysis pipeline for ONT long read data

This is a Snakemake pipeline for comprehensive analysis of longread sequencing data. It is mainly intented for research and method establishment at the Institute of medicinal genetics and applied genomics (IMGAG) of UKT Tübingen but can be adapted easily to fit your requirements.

Overview

Pipeline overview

Detailed information for the transcriptome analysis can be found in the document at doc/transcriptome_analysis.md.

Analyses

Installation

Dependencies:

  • Snakemake (pip install snakemake)
  • Conda
  • Docker/Singularity for Variant Calling with Pepper-Margin-Deepvariant

For cDNA (Transcriptome) analysis you need to manually install the following tool and adjust the application paths in the config file:

All other required tools will be installed into generated Conda environments.

Configuration

All configuration takes place in a config file (config.yml) located in the work directory. Look into the default settings (config/config_defaults) for a list of options. Most important is to to change the path to the genome reference and select required analysis steps.

The pipeline should be run in an analysis folder that will contain final and intermediate results. The location of the raw data is defined by the sample_run_table.tsv placed in the working dir. In the simplest form this is a two-column tab separated textfile containing sample names and data folder:

sample1
sample1
sample2

For more config options look into the input configuration.

Run the pipeline

Assuming you are in the working directory which contains a config.yml and a sample_run_table.tsv run the pipeline with:

snakemake --use-conda -s /path_to_repo/megLR/workflow/Snakefile