DEC pathotype assignment of E. coli genomes in R.
Diagnostic microbiology has developed several schemes to subtype Escherichia coli. These are useful for understanding the epidemiology and pathogenesis of particular strains. Diarrheagenic E. coli (DEC) pathotypes group E. coli strains which possess similar virulence factors (VF) and cause diseases with similar pathology. Whole-genome sequencing can predict the presence of VF genes in an E. coli isolate with high accuracy, permitting the assignment of DEC pathotypes without additional molecular, biochemical, or phenotypic assays.
PathotypeR
assigns E. coli genomes a DEC pathotype based on the presence/absence of specific VF genes. Inputs the output of AMRFinderPlus
.
PathotypeR
includes two functions:
pathotypeR()
: Function that quantifies each samples' VFs and assigns a DEC pathotype. Can also return total VF count per sample, VF presence/absence, the prevalence of each pathotype, and the prevalence of each VF.amrfinder_process()
: MergesAMRFinderPlus
output files into a single dataframe. (Called bypathotypeR()
but can be used on own.)
Genomes are assigned a DEC pathotype based on the presence/absence of specific VF genes. Namely:
- Shiga toxin-producing E. coli (STEC): stx1 and/or stx2 (without eae)
- Enteropathogenic E. coli (EPEC): eae and/or bfpA (without stx1 and/or stx2)
- Enterohaemorrhagic E. coli (EHEC): stx1 and/or stx2, and eae
- Enteroinvasive E. coli (EIEC): ipaH
- Enterotoxigenic E. coli (ETEC): ltcA and/or sta1
- Enteroaggregative E. coli (EAEC): aatA and/or aaiC and/or aggR
- Diffusely adherent E. coli (DAEC): afaC and/or afaE
- none: does not encode any of the above VF genes
Hybrid strains contain genes associated with multiple DEC pathotypes and will be reported as all pathotypes detected (e.g., STEC-EAEC, EPEC-ETEC).
NOTE: PathotypeR does NOT assign pathotypes based on collection site or association with disease, such as: extraintesinal pathogenic E. coli (ExPEC), uropathogenic E. coli (UPEC), neonatal meningitis-associated E. coli (NMEC), and sepsis-associated E. coli (SEPEC).
Install directly from GitHub:
source("https://raw.github.com/kevinsblake/PathotypeR/main/pathotype.R")
Alternatively, can download and then install using the filepath:
source("dir1/dir2/pathotype.R")
E. coli genomes of interest must first be run through AMRFinderPlus
. See their instructions for recommended usage.
The AMRFinderPlus
output must be saved as a .tsv
file. This can be done using the output flag: -o ${outdir}/${sample}.tsv
. Copy all of these output files into one directory. The filepath of this directory will be the input for PathotypeR.
Function for assigning DEC pathotype to E. coli genomes. First calls amrfinder_process()
.
library(dplyr)
pathotypeR(indir, output=c("patho_pred", "patho_prev", "vf_pres", "vf_prev"))
indir
Filepath to directory containing AMRFinderPlus
output files.
output
Specifies output. patho_pred
= for each sample, outputs VF count and pathotype prediction; patho_prev
= for each pathotype, outputs count (i.e. number of samples) and overall prevalence; vf_pres
= for each sample, outputs VF presence/absence (1=present, 0=absent); vf_prev
= for each VF, outputs count and overall prevalence. Default is patho_pred
.
# Outputs just sample names, VF count, and pathotype prediction
df <- pathotypeR("data/amrfinder")
# Outputs all VFs detected, count, and overall prevalence
df <- pathotypeR("data/amrfinder", output="vf_prev")
Function for merging AMRFinderPlus
output files into a single dataframe.
amrfinder_process(indir)
indir
Filepath to directory containing AMRFinderPlus
output files.
suffix
Specifies the suffix added to the amrfinder output filename. The filename minus this suffix should be the same as the Name
column in the amrfinder output. Default = ".tsv"
df <- amrfinder_process("data/amrfinder")
- Horesh et al. A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microb Genom. 2021 Feb;7(2):000499. doi: 10.1099/mgen.0.000499. PMID: 33417534.
- Jesser & Levy. Updates on defining and detecting diarrheagenic Escherichia coli pathotypes. Curr Opin Infect Dis. 2020 Oct; 33(5): 372–380. doi: 10.1097/QCO.0000000000000665. PMID: 32773499.
- Robins-Browne et al. Are Escherichia coli pathotypes still relevant in the era of whole-genome sequencing? 2016 Nov 18;6:141. doi: 10.3389/fcimb.2016.00141. eCollection 2016.PMID: 27917373.