A post sequencing QC tool for Oxford Nanopore sequencers
This workflow is dedicated to the QC analyses of Oxford Nanopore runs, and it is adapted to RNA-Seq along with DNA-Seq.
This QC tool supports only Guppy and Dorado basecaller output : sequencing_summary.txt and sequencing_telemetry.js files. Flow cells and kits version are retrieved using the telemetry file, but a a single FAST5 file can be used if a telemetry file is not provided. If the sequencing summary file is not available, ToulligQC can also accept FAST5, FASTQ or BAM files (but it will significantly increase the running time)
ToulligQC can take barcoding samples by adding the barcode list as an option. ToulligQC deals with different file formats: gz, tar.gz, bz2 and tar.bz2. This tool will produce a set of graphs, statistic file in plain text format and a HTML report.
Minimum requirements:
- CPUs = 1
- Memory = 4GB
Approximate run time: Approximately 5 minutes for 10M reads with the minimum requirements.
These are instructions to install and run the workflow on command line. You can also access the workflow via the EPI2ME Desktop application.
The workflow uses Nextflow to manage compute and software resources, therefore Nextflow will need to be installed before attempting to run the workflow.
The workflow can currently be run using either
[Docker](https://www.docker.com/products/docker-desktop
or Singularity
to provide isolation of the required software. Both methods are automated out-of-the-box provided either Docker or Singularity is installed.
This is controlled by the -profile
parameter as exemplified below.
It is not required to clone or download the git repository in order to run the workflow. More information on running EPI2ME workflows can be found on our website.
The following command can be used to obtain the workflow. This will pull the repository in to the assets folder of Nextflow and provide a list of all parameters available for the workflow as well as an example command:
nextflow run genomiqueens/wf-toulligqc --help
To update a workflow to the latest version on the command line use the following command:
nextflow pull genomiqueens/wf-toulligqc
A demo dataset is provided for testing of the workflow. It can be downloaded and unpacked using the following commands:
wget https://github.com/GenomiqueENS/wf-toulligqc/raw/main/demo_data/wf-toulligqc-demo.tar.gz
tar -xzvf wf-toulligqc-demo.tar.gz
The workflow can then be run with the downloaded demo data using:
nextflow run genomiqueens/wf-toulligqc \
--input_files 'sequencing_summary + telemetry_source' \
--sequencing_summary_source 'demo_data/sequencing_summary.txt' \
--telemetry_source 'demo_data/sequencing_telemetry.js'
For further information about running a workflow on the command line see https://labs.epi2me.io/wfquickstart/
This workflow is designed to take input sequences that have been produced from Oxford Nanopore Technologies devices.
Find related protocols in the Nanopore community.
Nextflow parameter name | Type | Description | Help | Default |
---|---|---|---|---|
input_files | string | Select what type/ combination of input files to be used for the analysis | Workflow can be run with only the Guppy/ Dorado basecaller output file sequencing_summary.txt, or with the additional sequencing_telemetry.js. It can also be run with only FASTQ or BAM or FAST5 files. | sequencing_summary.txt only |
Nextflow parameter name | Type | Description | Help | Default |
---|---|---|---|---|
sequencing_summary_source | string | Basecaller sequencing summary source, can be compressed with gzip (.gz) or bzip2 (.bz2) | ||
telemetry_source | string | Basecaller telemetry file source, can be compressed with gzip (.gz) or bzip2 (.bz2) | ||
fast5 | string | Fast5 file source, can also be in a tar.gz/tar.bz2 archive or a directory | Necessary if no telemetry file | |
fastq | string | FASTQ files to use in the analysis, can also be in a .gz archive | Necessary if no sequencing summary file | |
bam | string | BAM or SAM files to use in the analysis, can also be a SAM format. | Necessary if no sequencing summary file |
Nextflow parameter name | Type | Description | Help | Default |
---|---|---|---|---|
barcoding | boolean | BAM or SAM files to use in the analysis. | False | |
barcodes | string | Coma separated barcode list (e.g. BC05,RB09,NB01,barcode10) | ToulligQC handle the following naming schemes: BCXX, RBXX, NBXX and barcodeXX where XX is the number of the barcode | |
barcoding_summary_pass | string | Basecaller barcoding summary source of passed reads, can be compressed with gzip (.gz) or bzip2 (.bz2). | ||
barcoding_summary_fail | string | Basecaller barcoding summary source of passed reads, can be compressed with gzip (.gz) or bzip2 (.bz2). |
Nextflow parameter name | Type | Description | Help | Default |
---|---|---|---|---|
report_name | string | Name to give to report | ||
disable_ping | boolean | Enable to prevent sending a workflow ping. | False |
Title | File path | Description | Per sample or aggregated |
---|---|---|---|
workflow report | ./wf-template-report.html | Report for all samples. | aggregated |
See the EPI2ME website for lots of other resources and blog posts.