/spora

:mushroom: spora: Streamlined Phylogenomic Outbreak Report Analysis

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

spora: Streamlined Phylogenomic Outbreak Report Analysis

PyPI version example workflow

snakemake and Python integrated workflow for intermediate file generation for COVID outbreak analysis

Installation

git clone https://github.com/matt-sd-watson/spora.git
conda env create -f ncov_spora/environments/environment.yml
conda activate ncov_spora
cd spora
pip install . 

Updating

conda activate ncov_spora
cd ~/spora
git checkout main
git pull
pip install . 

Usage

usage: 
    	spora -c <config.yaml> 
    	OR
    	spora --focal_list ...<input args>

spora: Streamlined Phylogenomic Outbreak Report Analysis

optional arguments:
  -h, --help            Show the help output and exit.
  -c CONFIG, --config CONFIG
                        Input config file in yaml format, all command line arguments can be passed via the config file.
  -f FOCAL_SEQS, --focal-sequences FOCAL_SEQS
                        Input .txt list or multi-FASTA focal samples for outbreak. Required
  -b BACKGROUND_SEQS, --background-sequences BACKGROUND_SEQS
                        Optional input .txt list or multi-FASTA background samples to add to analysis
  -m MASTER_FASTA, --master-fasta MASTER_FASTA
                        Master FASTA of genomic sequences to select from. Required if either --focal-sequences or --background-sequences are not supplied in FASTA format
  -o OUTDIR, --output-directory OUTDIR
                        Path to the desired output directory. If none is provided, a new folder named spora will be created in the current directory
  -r REFERENCE, --reference REFERENCE
                        .gb file containing the desired COVID-19 reference sequence. Required
  -p PREFIX, --prefix PREFIX
                        Prefix string to label all output files. Default: outbreak
  -t NTHREADS, --nthreads NTHREADS
                        Number of threads to use for processing. Default: 2
  -s, --snps-only       Generate a snps-only FASTA from the input FASTA. Default: False
  -rn, --rename         Rename the FASTA headers to be compatible with NML standards. Default: False
  -nc NAMES_CSV, --names-csv NAMES_CSV
                        Use the contents of a CSV to rename the input FASTA. Requires the following column headers: original_name, new_name
  -ncs, --no-constant-sites
                        Do not enable constant sites to be used for SNPs only tree generation. Default: Enabled
  -fi, --filter         Filter both the focal and background sequences based on genome completeness and length. Default: Not enabled
  -gc GENOME_COMPLETENESS, --genome-completeness GENOME_COMPLETENESS
                        Integer for the minimum genome completeness percentage for filtering. Default: 90
  -gl GENOME_LENGTH, --genome-length GENOME_LENGTH
                        Integer for the minimum genome length for filtering. Default: 29500
  -rp, --report         Generate a summary output report for the spora run. Default: Not enabled
  -v, --version         Show the current spora version then exit.

Documentation

More detailed documentation for spora usage and functionality can be found here

Acknowledgments

Inspiration for code structure and design for spora was inspired by pangolin and civet, and minor code blocks were adopted from these software.

The Background section in the documentation describing outbreak definitions was written by Mark Horsman.