/Influenza_Pipeline

KDHE Influenza A pipeline, built to take MinIon fastq files and produce clade analysis

Primary LanguagePerl

General Package for State Lab Influenza A Clade Detection


The package contains the following workflows in their respective subdirectories:


  • Joins all individual samples fastq files into one fastq file per sample
  • Map barcodes to sample IDs
  • Create/Return sample dictionary holding path to fastq

This step is required, since the assembler only takes in 1 fastq file per sample.
And need to map samples to their respective sample ID.



  • Run IRMA assembler
  • After consense sequences are created, check protein assembly stats and write them to a JSON file

This step is required, to obatin Influenza consense sequences.



  • Open HORIZON LIMS database (Oracle).
  • Join all demographics with sample ID.
  • Join demographics with assembly stats.
  • Push new demographics to Influenza DB MS SQL database.
  • Write demographical iformation for final result file

This step is absolutely required, since the sample ID is the primary key in the database
making it impossible to insert any other results further down the workflow.



Workflow 3: Run Nextclade

  • Run Nextcalde to determine clade of seqeunced influenza virus
  • Check against Influenza A h3n2,h1n1pdm types (can be modified to include FLU B)
  • Parse the nextcalde data
  • Push the Nextclade data to Influenza DB MS SQL database.


Workflow 4: Gisaid Report

  • Currently NOT IMPLETMENTED DUE TO GISAID FLU being down.
  • Extract required data from nextclade files.

Current not implemented due to GISAID Flu DB being down



Workflow 5: Build Epi Report

  • Ask user for search to perform
  • Takes both Nextclade results and demographic information and builds report
  • Formats consense fasta files
  • Aligns samples to referance Influenza A h3n2,h1n1pdm type based on nextclade restuls
  • Builds phylogenetic tree for each influenza A types based on alignement files
  • Cleans ups intermediate files