a-hr/ONT-splicing-pipeline pipeline parameters

Introduction

A pipeline to process the sequencing output coming from Plasmidsaurus. It can also be used with standard Nanopore data, but keep in mind that each sample must be provided as a single FASTQ file.

The pipeline performs the following steps:

QC of the FASTQ files
Alignment of the reads to the reference genome
Generation of BAM files
Generation of sashimi-plots using ggsashimi

Installation

The pipeline is written in Nextflow, a workflow manager that allows to run the pipeline in a wide variety of systems. It is configured to be run either on a SLURM-managed HPC cluster or a local machine, though it can be run on a cloud instance or using other workload managers by editing the configuration file according to Nextflow documentation.

Installing in a local machine

In your local machine, there are two ways of running the pipeline:

Using EPI2ME: the easiest way for those without bioinformatics experience. It's just a graphical interface that allows you to run the pipeline. Below will be explained how to install it.
Using the command line: for those with bioinformatics experience. It follows the same procedure as running in a cluster, so it will be explained in the Installing in a cluster section.

Installing EPI2ME

Install EPI2ME on your system and follow the instructions on the app to install all the dependencies (Java, Docker and Nextflow). To add the pipeline to your saved workflows simply copy this repository's URL and paste it on the "Add workflow" section of the EPI2ME interface.

Installing in a cluster

If using Nextflow/nf-core, clone the repository and install the basic dependencies (Nextflow). The easiest way to do so is using conda. The pipeline can be run on any system that supports Docker or Singularity. If using Windows, we recommend using the Windows Subsystem for Linux (WSL).

git clone https://github.com/a-hr/plasmidsaurus_nextflow.git

The internal dependencies of the pipeline are managed by Nextflow, so you don't need to worry about them. If for some reason Nextflow fails to download them when using Singularity (they are provided as Docker containers), you can manually download them with the Makefile:

# make sure you have Singularity installed and available
make pull

The pipeline is especially tailored to be run on a HPC cluster, though it can seamlessly be run on a local machine and, with some configuration, on a cloud instance.

Usage

Running with EPI2ME

Open EPI2ME and go to the "Workflows" tab.
Select the workflow.
Fill in the parameters.
In the profile section, type in local_docker.
Run the pipeline.

Running with the command line

Go to the directory where you cloned the repository.
Fill in the parameters in the input_params.yaml file.
Make sure your system has Docker/Singularity and Nextflow available.
Run the pipeline in the cluster with the following command:

sbatch launch.sh

The launch script is configured to run the pipeline in a SLURM-managed HPC cluster. If you are using another workload manager, you will need to edit the script accordingly.

If you are running the pipeline in a local machine, you can run it with the following command:

nextflow run main.nf -profile local_docker -params-file input_params.yaml
# or with Singularity
nextflow run main.nf -profile local_singularity -params-file input_params.yaml

Parameters

Run options

Parameter	Description	Type	Default	Hidden
`run_name`	name of the experiment (files will be named after it)	`string`	plasmidsaurus-mar24
`output_dir`		`string`	/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/output-plasmidsaurus-mar24	True
`get_bams`	whether to output the aligned BAMs	`boolean`	True
`get_sashimis`	whether to generate sashimi plots	`boolean`	True

Input parameters

Parameter	Description	Type	Default	Required
`input_fastq`	path to the directory containing the FASTQ files to align	`string`	/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/inputs/fastqs	True
`ref_fa`	path to the reference FASTA	`string`	/scratch/heral/indexes/GRCh38.primary_assembly.genome.fa	True
`ref_bed`	path to the reference BED file to allow splice-aware alignment	`string`	/scratch/heral/indexes/gencode.v41.primary_assembly.annotation.bed	True
`ref_gtf`	path to the reference GTF file to include transcript annotations in sashimis	`string`	/scratch/heral/indexes/gencode.v41.primary_assembly.annotation.gtf
`plots_config`	path to the CSV file containing plot configuration. Help The semicolon (;) separated file should have the following fields: - plotID : the ID that groups twogether the BAMs of the plot. Can be repeated as many times as necessary. - coords : the coordinates that will be used in the plot. Format: chr:start-end - fastqName: the name, without extension, of the file to include in the plot. Can be used in more than one plot. - groupName: the group the file belongs to (e.g WT, KO...). Groups together different files inside a specific plot.	`string`	/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/inputs/plots.csv	True

QC options

Parameter	Description	Type	Default	Required	Hidden
`min_len`	minimum length a read should have in order to be processed	`integer`	200
`max_len`	maximum length a read should have in order to be processed	`integer`	5000

Plot options

ggsashimi's internal options

Parameter	Description	Type	Default
`sashimi_min_cov`	minimum coverage of an event in order to be included	`integer`	5
`sashimi_alpha`	alpha to apply on the coverage colour	`number`	0.6
`sashimi_collapse_groups`	collapse the files by group	`boolean`
`sashimi_shrink`	shrink the intronic regions without coverage	`boolean`
`sashimi_fix_scale`	set the same Y-axis scale to all the groups/files	`boolean`	True
`sashimi_annot_height`	height of the annotations in the transcript track	`integer`	5
`sashimi_width`	width in cm of the output plot	`integer`	15