/tbprofiler-nf

Nextflow Wrapper for TBProfiler

Primary LanguagePython

Tests

TBProfiler-NF: A Nextflow Wrapper for the TBProfiler Mycobacteria Genomics Analysis Tool

This is a Nextflow-based wrapper for the TBProfiler pipeline for antimicrobial resistance (AMR) analysis of whole-genome sequence data for Mycobacteria tuberculosis complex samples. This pipeline provides a convenient way to analyze many samples at once, on an HPC cluster system. It also integrates additional QC analysis of the input data using fastp, and of the generated alignments using Qualimap.

flowchart TD
  reads --> fastp
  fastp -- trimmed_reads --> tbprofiler
  tbprofiler -- vcf --> snpit
  snpit --> check_snpit_against_tbprofiler
  tbprofiler -- bam --> qualimap_bamqc
  tbprofiler -- bam --> mpileup
  mpileup -- depths --> plot_coverage
  mpileup -- depths --> generate_low_coverage_bed
Loading

Usage

nextflow run BCCDC-PHL/tbprofiler-nf \
  --fastq_input <fastq_input_dir> \
  --outdir <output_dir>

Parameters

Flag Default Value Description
min_depth 10 Minimum depth of coverage used to call a SNP
min_af_used_for_calling 0.1 Minimum minor allele fraction used to call a SNP
min_af_used_for_prediction 0.1 Minimum minor allele fraction used to predict AMR

Output

An output directory will be created for each sample under the directory provided for the --outdir flag.

The following files will be produced for each sample:

.
└── sample-01
    ├── sample-01_TIMESTAMP_provenance.yml
    ├── sample-01_coverage_plot.png
    ├── sample-01_fastp.csv
    ├── sample-01_fastp.json
    ├── sample-01_low_coverage_regions.bed
    ├── sample-01_qualimap_alignment_qc.csv
    ├── sample-01_snpit.tsv
    ├── sample-01_snpit_unchecked.tsv
    ├── sample-01_tbprofiler.bam
    ├── sample-01_tbprofiler.bam.bai
    ├── sample-01_tbprofiler_full_report.csv
    ├── sample-01_tbprofiler_full_report.json
    ├── sample-01_tbprofiler_lineage.csv
    ├── sample-01_tbprofiler_resistance.csv
    ├── sample-01_tbprofiler_resistance_mutations.csv
    ├── sample-01_tbprofiler_summary.csv
    ├── sample-01_tbprofiler_targets.vcf
    └── sample-01_tbprofiler_whole_genome.vcf

Provenance

In the output directory for each sample, a provenance file will be written with the following format:

- pipeline_name: BCCDC-PHL/tbprofiler-nf
  pipeline_version: 0.2.2
  nextflow_session_id: ee5b4986-6ada-4eab-a294-ed0cbb18427d
  nextflow_run_name: furious_murdock
  analysis_start_time: 2024-02-01T16:37:26.062501-08:00
- input_filename: SAMPLE-ID_S133_L001_R1_001.fastq.gz
  file_type: fastq-input
  sha256: 1b6a9a616ec3fd8432ff02f51d60fb6443617c29761b96234ede9c65efe06547
- input_filename: SAMPLE-ID_S133_L001_R2_001.fastq.gz
  file_type: fastq-input
  sha256: f6954b1a174fbead8a035ae7cdfda549fcc751be8847a330505df49de59bed96
- process_name: fastp
  tools:
    - tool_name: fastp
      tool_version: 0.23.2
      parameters:
        - parameter: --cut_tail
          value: null
- process_name: tbprofiler
  tools:
    - tool_name: tb-profiler
      tool_version: 4.3.0
      subcommand: profile
      parameters:
        - parameter: --platform
          value: illumina
        - parameter: --mapper
          value: bwa
        - parameter: --caller
          value: bcftools
        - parameter: --af
          value: 0.1
        - parameter: --reporting_af
          value: 0.1
        - parameter: --prefix
          value: 23s629
        - parameter: --csv
          value: null
        - parameter: --call_whole_genome
          value: null
- process_name: snpit
  tools:
    - tool_name: snpit
      tool_version:  1.0.0
      parameters:
        - parameter: --input
          value: 23s629_tbprofiler_whole_genome.vcf