Scripts, data and outputs of a paper on the genomic epidemiology of E. coli ST58
The following repository allows conscientious readers of the ST58 manuscript to reproduce all the data processing, statistics and figures presented in the paper.
It comprises two directories: scripts
and raw_data
(the contents of which should be self explanatory) and generates an outputs
folder with figures
and data
subdirectories.
These scripts are currently functional on mac OS Big Sur 11.6.1 using RStudio 1.4.1106 and R version 4.0.5. We cannot guarantee they will work on other distributions of R or RStudio. Your OS should not be an issue provided you use these versions of R and RStudio though.
You will need to install the following packages and versions to work with the scripts:
- data.table_1.14.0
- tidyverse_1.3.1
- magrittr_2.0.1
- rstatix_0.7.0
- RColorBrewer_1.1-2
- ggtree_3.1.0
- pheatmap_1.0.12
- reshape2_1.4.4
- ggpubr_0.4.0
- ggplot2_3.3.5
- tibble_3.1.5
- purrr_0.3.4
- readr_2.0.1
- stringr_1.4.0
- forcats_0.5.1
- tidyr_1.1.4
- dplyr_1.0.7
There have recently been some issues with ggtree in the way it interacts with dplyr. The solution is to install the latest version of ggtree directly from github instead of via BiocManager. You can do this in the console on RStudio with the remotes
package like so:
install.packages("remotes")
remotes::install_github("YuLab-SMU/ggtree")
Clone this repository
git clone https://github.com/CJREID/ST58_project.git
cd ST58_project
pwd
Open the data_vis.R script in a text editor or RStudio and set the variable wrkdir
on line 18 to the output of pwd
above and save the script.
Run the data_vis.R script and watch your outputs
folder magically fill with goodies.
- Figure 1. Metadata summary
- Figure 2. Core gene phylogenetic tree with metadata (Additional manual annotation in manuscript)
- Figure 3. Source and F RST distributions by BAP cluster
- Figure 4. pCERC4 heatmap (Exported in 2 parts; manually edited for publication)
- Figure 5. Source distribution by ColV status in ST58 Collection
- Figure 6. Tree-heatmap of presence/absence of BAP2 (ColV) positively- and negatively-associated genes (Additional manual annotation in manuscript)
- Figure 7. Virulence and resistance gene carriage rates by BAP cluster
- Figure 8. Source distributions of ColV in Enterobase genome collection
- Figure 9. Heatmap of low SNP distances between epidemiologically unrelated sequences
- Supplementary Data 1. Processed metadata and accession numbers for all sequences
- Supplementary Data 2. Gene presence/absence data
- Supplementary Data 3. Genes postively/negatively-associated with BAP clusters as identified by Roary and Scoary
- Supplementary Data 4. Enterobase metadata and ColV gene screening
- Fig. S1. Source distribution across all collection years
- Fig. S2. Serotype distributions by source and ColV+/-
- Fig. S3. FimH distributions by source and ColV+/-
- Fig. S4. F RST by source and ColV carriage
- Fig. S5. Heatmap of Liu ColV criteria gene carriage for ColV+ sequences (Manually edited in manuscript)
- Fig. S6. Absolute and relative ColV carriage by sources
- Fig. S7. BAP6 Scoary Heatmap (Additional manual annotation in manuscript)
- Fig. S8. Tree-heatmap of antimicrobial resistance genes
- Fig. S9. Tree-heatmap of virulence-associated genes
- Fig. S10. Tree-heatmap of plasmid replicon genes
- Fig. S11. Tree-heatmap of pairwise SNP distance between all sequences, mapped to the core gene phylogeny