/nart

A tool for Nanopore Amplicon Real-Time (NART) analysis.

Primary LanguagePythonOtherNOASSERTION

A tool for
Nanopore Amplicon Real-Time (NART) analysis

snakemake linux/amd64 CI Docker

NART is desgined for mapping-based Nanopore Amplicon (Real-Time) analysis, e.g., 16S rRNA gene. NART utils are composed of nart (Nanopore Amplicon Real-Time entry) and nawf (Nanopore Amplicon snakemake WorkFlow entry) in one python package. NART provides an (real-time) end-to-end solution from bascecalled reads to the final count matrix through mapping-based strategy.

Important: NART is under development, and here released as a preview. NART is only tested in Linux systems, i.e., Ubuntu.

demo

Demo video on Youtube

DAG workflow

nawf provide three options (i.e., emu, minimap2lca and blast2lca) to determine microbial composition. dag

Docker image

The easiest way to use NART is to pull the docker image from Docker Hub for cross-platform support.

docker pull yanhui09/nart

To use the docker image, you need to mount your data directory, e.g., pwd, to the /home in the container.

docker run -it -v `pwd`:/home --network host --privileged yanhui09/nart

Note: --network host is required for nart monitor to work.

The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server. [Read more]

Installation from Github repository

Conda is the only required dependency prior to installation. Miniconda is enough for the whole pipeline.

  1. Clone the Github repository and create an isolated conda environment
git clone https://github.com/yanhui09/nart.git
cd nart
conda env create -n nart -f env.yaml 

You can speed up the whole process if mamba is installed.

mamba env create -n nart -f env.yaml 
  1. Install NART with pip

To avoid inconsistency, we suggest installing NART in the above conda environment

conda activate nart
pip install --editable .

At this moment, NART uses guppy or minibar for custom barcode demultiplexing (in our lab).
Remember to prepare the barcoding files in guppy or minibar if new barcodes are introduced. Click me

Quick start

Remember to activate the conda environment if NART is installed in a conda environment.

conda activate nart

Amplicon analysis in single batch

nawf can be used to profile any single basecalled fastq file from a Nanopore run or batch.

nawf config -b /path/to/single_basecall_fastq -d /path/to/database    # init config file and check
nawf run all                                                          # start analysis

Real-time analysis

nart provide utils to record, process and profile the continuously generated fastq batch.

Before starting real-time analysis, you need nawf to configure the workflow according to your needs.

nawf config -d /path/to/database                                      # init config file and check

In common cases, you need three independent sessions to handle monitor, process and visulization, repectively.

  1. Minitor the bascall output and record
nart monitor -q /path/to/basecall_fastq_dir                    # monitor basecall output
  1. Start amplicon analysis for new fastq
nart run -t 10                                                 # real-time process in batches
  1. Update the feature table for interactively visualize in the browser
nart visual                                                    # interactive visualization

Usage

NART is composed of two sets of scripts: nart and nawf, which controls real-time analysis and workflow performance, respectively.

nart

Usage: nart [OPTIONS] COMMAND [ARGS]...

  NART: A tool for Nanopore Amplicon Real-Time (NART) analysis. To follow
  updates and report issues, see: https://github.com/yanhui09/nart.

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  monitor  Start NART to monitor a directory.
  run      Start NART workflow.
  visual   Start NART app to interactively visualize the results.
Usage: nart monitor [OPTIONS]

  Start NART monitor.

Options:
  -q, --query PATH       A query directory to monitor the new fastq files.
  -e, --extension TEXT   The file extension to monitor for (e.g. '.fastq.gz').
                         [default: .fastq.gz]
  -w, --workdir PATH     Workflow working directory.  [default: .]
  -t, --timeout INTEGER  Stop query if no new files were generated within the
                         give minutes.  [default: 30]
  -h, --help             Show this message and exit.
Usage: nart run [OPTIONS] [SNAKE_ARGS]...

  Start NART.

Options:
  -w, --workdir PATH     Workflow working directory.  [default: .]
  -t, --timeout INTEGER  Stop run if no new files were updated in list within
                         the given minutes.  [default: 10]
  -c, --configfile FILE  Workflow config file. Use config.yaml in working
                         directory if not specified.
  -j, --jobs INTEGER     Maximum jobs to run in parallel.  [default: 6]
  -m, --maxmem FLOAT     Specify maximum memory (GB) to use. Memory is
                         controlled by profile in cluster execution.
  --profile TEXT         Snakemake profile for cluster execution.
  -n, --dryrun           Dry run.
  -h, --help             Show this message and exit.
Usage: nart visual [OPTIONS]

Options:
  -p, --port INTEGER              Port to run the app on.  [default: 5000]
  -i, --input PATH                Path to the working directory.  [default: .]
  -w, --wait-time INTEGER         Time to wait (in minutes) if input file is
                                  missing.  [default: 5]
  --relative                      Use relative abundance instead of absolute
                                  abundance.
  --rm-unmapped                   Remove unmapped reads from the table.
  --min-abundance INTEGER         Minimum absolute abundance of a feature to
                                  plot.  [default: 1]
  --order-by [mean|median|alpha]  Order taxonomic features by mean, median, or
                                  alphabetically.  [default: mean]
  -h, --help                      Show this message and exit.

nawf

Usage: nawf [OPTIONS] COMMAND [ARGS]...

  NAWF: A sub-tool to run Nanopore Amplicon WorkFlow. The workflow command
  initiates the NAWF in a single batch, using either a fastq file from one ONT
  run or a fastq file generated during sequencing. To follow updates and
  report issues, see: https://github.com/yanhui09/nart.

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  config  Generate the workflow config file.
  run     Start workflow in a single batch.
Usage: nawf config [OPTIONS]

  Config NAWF.

Options:
  -b, --bascfq PATH               Path to a basecalled fastq file. Option is
                                  mutually exclusive with 'demuxdir'.
  -x, --demuxdir PATH             Path to a directory of demultiplexed fastq
                                  files. Option is mutually exclusive with
                                  'bascfq'.
  -d, --dbdir PATH                Path to the taxonomy databases.  [required]
  -w, --workdir PATH              Output directory for NAWF.  [default: .]
  --demuxer [guppy|minibar]       Demultiplexer.  [default: guppy]
  --fqs-min INTEGER               Minimum number of reads for the
                                  demultiplexed fastqs.  [default: 50]
  --subsample                     Subsample the reads.
  --chimera-filt                  Filter chimeric reads.
  --primer-check                  Check primer pattern.
  --classifier [emu|minimap2lca|blast2lca]
                                  Classifier.  [default: emu]
  --jobs-min INTEGER              Number of jobs for common tasks.  [default:
                                  2]
  --jobs-max INTEGER              Number of jobs for threads-dependent tasks.
                                  [default: 6]
  -h, --help                      Show this message and exit.
Usage: nawf run [OPTIONS] {init|demux|qc|all} [SNAKE_ARGS]...

  Run NAWF in a single batch.

Options:
  -w, --workdir PATH     Workflow working directory.  [default: .]
  -c, --configfile FILE  Workflow config file. Use config.yaml in working
                         directory if not specified.
  -j, --jobs INTEGER     Maximum jobs to run in parallel.  [default: 6]
  -m, --maxmem FLOAT     Specify maximum memory (GB) to use. Memory is
                         controlled by profile in cluster execution.
  --profile TEXT         Snakemake profile for cluster execution.
  -n, --dryrun           Dry run.
  -h, --help             Show this message and exit.