A pipeline to process the sequencing output coming from Plasmidsaurus. It can also be used with standard Nanopore data, but keep in mind that each sample must be provided as a single FASTQ file.
The pipeline performs the following steps:
- QC of the FASTQ files
- Alignment of the reads to the reference genome
- Generation of BAM files
- Generation of sashimi-plots using ggsashimi
The pipeline is written in Nextflow, a workflow manager that allows to run the pipeline in a wide variety of systems. It is configured to be run either on a SLURM-managed HPC cluster or a local machine, though it can be run on a cloud instance or using other workload managers by editing the configuration file according to Nextflow documentation.
In your local machine, there are two ways of running the pipeline:
- Using EPI2ME: the easiest way for those without bioinformatics experience. It's just a graphical interface that allows you to run the pipeline. Below will be explained how to install it.
- Using the command line: for those with bioinformatics experience. It follows the same procedure as running in a cluster, so it will be explained in the
Installing in a cluster
section.
Install EPI2ME on your system and follow the instructions on the app to install all the dependencies (Java, Docker and Nextflow). To add the pipeline to your saved workflows simply copy this repository's URL and paste it on the "Add workflow" section of the EPI2ME interface.
If using Nextflow/nf-core, clone the repository and install the basic dependencies (Nextflow). The easiest way to do so is using conda. The pipeline can be run on any system that supports Docker or Singularity. If using Windows, we recommend using the Windows Subsystem for Linux (WSL).
git clone https://github.com/a-hr/plasmidsaurus_nextflow.git
The internal dependencies of the pipeline are managed by Nextflow, so you don't need to worry about them. If for some reason Nextflow fails to download them when using Singularity (they are provided as Docker containers), you can manually download them with the Makefile:
# make sure you have Singularity installed and available
make pull
The pipeline is especially tailored to be run on a HPC cluster, though it can seamlessly be run on a local machine and, with some configuration, on a cloud instance.
- Open EPI2ME and go to the "Workflows" tab.
- Select the workflow.
- Fill in the parameters.
- In the
profile
section, type inlocal_docker
. - Run the pipeline.
- Go to the directory where you cloned the repository.
- Fill in the parameters in the
input_params.yaml
file. - Make sure your system has Docker/Singularity and Nextflow available.
- Run the pipeline in the cluster with the following command:
sbatch launch.sh
The launch script is configured to run the pipeline in a SLURM-managed HPC cluster. If you are using another workload manager, you will need to edit the script accordingly.
If you are running the pipeline in a local machine, you can run it with the following command:
nextflow run main.nf -profile local_docker -params-file input_params.yaml
# or with Singularity
nextflow run main.nf -profile local_singularity -params-file input_params.yaml
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
run_name |
name of the experiment (files will be named after it) | string |
plasmidsaurus-mar24 | ||
output_dir |
string |
/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/output-plasmidsaurus-mar24 | True | ||
get_bams |
whether to output the aligned BAMs | boolean |
True | ||
get_sashimis |
whether to generate sashimi plots | boolean |
True |
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
input_fastq |
path to the directory containing the FASTQ files to align | string |
/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/inputs/fastqs | True | |
ref_fa |
path to the reference FASTA | string |
/scratch/heral/indexes/GRCh38.primary_assembly.genome.fa | True | |
ref_bed |
path to the reference BED file to allow splice-aware alignment | string |
/scratch/heral/indexes/gencode.v41.primary_assembly.annotation.bed | True | |
ref_gtf |
path to the reference GTF file to include transcript annotations in sashimis | string |
/scratch/heral/indexes/gencode.v41.primary_assembly.annotation.gtf | ||
plots_config |
path to the CSV file containing plot configuration. HelpThe semicolon (;) separated file should have the following fields:- plotID : the ID that groups twogether the BAMs of the plot. Can be repeated as many times as necessary. - coords : the coordinates that will be used in the plot. Format: chr:start-end - fastqName: the name, without extension, of the file to include in the plot. Can be used in more than one plot. - groupName: the group the file belongs to (e.g WT, KO...). Groups together different files inside a specific plot. |
string |
/Users/varo/Desktop/pipe_plasmidsaurus/plasmidsaurus_pipeline/inputs/plots.csv | True |
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
min_len |
minimum length a read should have in order to be processed | integer |
200 | ||
max_len |
maximum length a read should have in order to be processed | integer |
5000 |
ggsashimi's internal options
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
sashimi_min_cov |
minimum coverage of an event in order to be included | integer |
5 | ||
sashimi_alpha |
alpha to apply on the coverage colour | number |
0.6 | ||
sashimi_collapse_groups |
collapse the files by group | boolean |
|||
sashimi_shrink |
shrink the intronic regions without coverage | boolean |
|||
sashimi_fix_scale |
set the same Y-axis scale to all the groups/files | boolean |
True | ||
sashimi_annot_height |
height of the annotations in the transcript track | integer |
5 | ||
sashimi_width |
width in cm of the output plot | integer |
15 |