/moimix

R package for inferring multiple infections from high-throughput sequencing data

Primary LanguageROtherNOASSERTION

Build Status DOI

moimix: an R package for evaluating multiplicity of infection in malaria parasites

http://bahlolab.github.io/moimix/

Features

  • Estimate multiplicity of infection from massively parallel sequencing data
  • Estimate heterzygosity and within-isolate diversity directly from read-counts
  • Call major alleles within isolates from B-allele frequencies
  • Prepare SNP barcode data for use in COIL
  • Simulate single nucleotide variant data with known multiplicity of infection

How do I install moimix?

There are plans to put moimix on Bioconductor in the future, however it is currently only available to install as a development version from Github:

# install using devtools packages
# first install bioc dependencies
install.packages("BiocManager")
BiocManager::install("bahlolab/moimix", build_vignettes = TRUE)

What data input does moimix require?

moimix makes use of the Genomic Data Storage (GDS) format used by the Bioconductor package SeqArray to provide fast access to VCF files in R.

To convert a VCF file to the GDS:

library(SeqArray)
seqVCF2GDS("isolate_snps.vcf.gz", "isolate_snps.gds")

It is also possible to estimate MOI from a matrix of read counts where the first column is the number of reads supporting the reference allele and the second column is the number of reads supporting the alternate allele.

How do I use moimix?

See the introduction vignette for usage examples.