General Package for State Lab Influenza A Clade Detection

The package contains the following workflows in their respective subdirectories:

Workflow 0: Merge Sequencing Data

Joins all individual samples fastq files into one fastq file per sample
Map barcodes to sample IDs
Create/Return sample dictionary holding path to fastq

This step is required, since the assembler only takes in 1 fastq file per sample.
And need to map samples to their respective sample ID.

Workflow 1: Assemble Reads using IRMA

Run IRMA assembler
After consense sequences are created, check protein assembly stats and write them to a JSON file

This step is required, to obatin Influenza consense sequences.

Workflow 2: Import demographics

Open HORIZON LIMS database (Oracle).
Join all demographics with sample ID.
Join demographics with assembly stats.
Push new demographics to Influenza DB MS SQL database.
Write demographical iformation for final result file

This step is absolutely required, since the sample ID is the primary key in the database
making it impossible to insert any other results further down the workflow.

Workflow 3: Run Nextclade

Run Nextcalde to determine clade of seqeunced influenza virus
Check against Influenza A h3n2,h1n1pdm types (can be modified to include FLU B)
Parse the nextcalde data
Push the Nextclade data to Influenza DB MS SQL database.

Workflow 4: Gisaid Report

Currently NOT IMPLETMENTED DUE TO GISAID FLU being down.
Extract required data from nextclade files.

Current not implemented due to GISAID Flu DB being down

Workflow 5: Build Epi Report

Ask user for search to perform
Takes both Nextclade results and demographic information and builds report
Formats consense fasta files
Aligns samples to referance Influenza A h3n2,h1n1pdm type based on nextclade restuls
Builds phylogenetic tree for each influenza A types based on alignement files
Cleans ups intermediate files

AdrianLimaG/Influenza_Pipeline

General Package for State Lab Influenza A Clade Detection

The package contains the following workflows in their respective subdirectories:

Workflow 0: Merge Sequencing Data

Workflow 1: Assemble Reads using IRMA

Workflow 2: Import demographics

Workflow 3: Run Nextclade

Workflow 4: Gisaid Report

Workflow 5: Build Epi Report