/floria

Strain-level haplotyping for metagenomes with short or long-reads.

Primary LanguageRustMIT LicenseMIT

floria - metagenomic long or short-read strain haplotype phasing

Introduction

Floria is a software package for recovering microbial haplotypes and clustering reads at the strain level from metagenomic sequencing data. See the introduction here for more information.

After calling SNPs against reference genomes or a metagenomic assembly, floria produces 1) strain-level clusters of short or long reads and 2) their haplotypes in minutes.

A 1Mbp contig (Brevefilum fermentans) was automatically phased into two strains (top: y-axis is coverage). Only two strains are present with high HAPQ; spurious "haplosets" are given low HAPQ.

Inputs

Floria requires:

  1. a list of variants in .vcf format
  2. a set of reads mapped to assembled contigs/references in .bam format

See the "Floria-PL" pipeline here for reads-to-haplotype pipelines if you do not know how to get started with generating VCFs or BAMs.

Outputs, tutorials, and manuals (full documentation)

See https://phase-doc.readthedocs.io/en/latest/index.html for more information on tutorials, outputs, and extra manuals for usage.

Install + Quick start

Option 1 - compile from scratch

A relatively recent standard toolchain is needed.

  1. rust version > 1.63.0 and associated tools such as cargo are required and assumed to be in PATH.
  2. cmake version > 3.12 is required. It's sufficient to download the binary from the link and do PATH="/path/to/cmake-3.xx.x-linux-x86_64/bin/:$PATH" before installation.
  3. make
  4. GCC

If you're using an x86-64 architecture with SSE instructions (most linux systems):

git clone https://github.com/bluenote-1577/floria
cd floria

cargo install --path . 
floria -h # binary is available in PATH

If you're using an ARM architecture with NEON instructions (e.g. Mac M1):

# If using ARM architecture with NEON instructions
cargo install --path . --root ~/.cargo --features=neon --no-default-features
floria -h # binary is available in PATH

Option 2 - bioconda

conda install -c bioconda floria

Option 3 - precompiled static binary on x86-64-linux

The static binary is only for x86-64 linux with SSE instructions currently.

wget https://github.com/bluenote-1577/floria/releases/download/latest/floria
chmod +x floria
./floria -h

Quick Start after install

git clone https://github.com/bluenote-1577/floria
cd floria

# run floria on mock data
floria -b tests/test_long.bam  -v tests/test.vcf  -r tests/MN-03.fa -o 3_klebsiella_strains
ls 3_klebsiella_strains

# visualize strain "vartigs" if you have matplotlib
python scripts/visualize_vartigs.py 3_klebsiella_strains/NZ_CP081897.1/NZ_CP081897.1.vartigs

Citation

*Co-lead authors

Jim Shaw*, Jean-Sebastien Gounot*, Hanrong Chen, Niranjan Nagarajan, Yun William Yu. Floria: Fast and accurate strain haplotyping in metagenomes (2024). Bioinformatics.