Welcome to the Microbial Informatics 2014 labs. This page contains a number of tutorials on performing data analysis on whole genome sequencing data for the Microbial Informatics workshop hosted at the National Microbiology Laboratory in Winnipeg, Canada. These labs can be accessed online at https://github.com/apetkau/microbial-informatics-2014.
The data for these labs is a set of whole genome sequencing data from a number of V. Cholerae strains from the outbreak of cholera in Haiti beginning in 2010 as well as a number of other V. cholerae strains included for comparison. This data was previously published in http://mbio.asm.org/content/4/4/e00398-13.abstract and http://mbio.asm.org/content/2/4/e00157-11.abstract and is available on NCBI's Sequence Read Archive. A table of the specific data used within this lab is given below.
Strain | Location | Year | NCBI Accession |
---|---|---|---|
2010EL-1786 | Haiti | 2010 | NC_016445.1,NC_016446.1 |
2010EL-1749 | Cameroon | 2010 | SRR773655 |
2010EL-1796 | Haiti | 2010 | SRR771582 |
2010EL-1798 | Haiti | 2010 | SRR074109 |
2011EL-2317 | Haiti | 2011 | SRR773175 |
2012V-1001 | United States | 2011 | SRR892331 |
3554-08 | Nepal | 2008 | SRR774919 |
C6706 | Peru | 1991 | SRR774920 |
VC-1 | Banke district, Nepalgunj municipality | 2010 | SRR308665 |
VC-10 | Banke district, Nepalgunj municipality | 2010 | SRR308707 |
VC-14 | Banke district, Nepalgunj municipality | 2010 | SRR308715 |
VC-15 | Dang Deokhuri district, Narayanpur VDC | 2010 | SRR308716 |
VC-18 | Banke district, Nepalgunj municipality | 2010 | SRR308721 |
VC-19 | Kathmandu district, Kathmandu city | 2010 | SRR308722 |
VC-25 | Rupandehi district, Butawal municipality | 2010 | SRR308726 |
VC-26 | Rupandehi district, Butawal municipality | 2010 | SRR308727 |
VC-6 | Banke district, Nepalgunj municipality | 2010 | SRR308703 |
These labs will go through data analysis on the above strains. We will not reproduce the exact types of figures from the publications but the labs should help in getting started working with microbial whole genome sequence data.
These labs assume that you are familar working within a Linux environment using the command line.
July 31, 2020: Note the virtual machines are no longer available.
All necessary software to run these labs is provided in the form of a customized Ubuntu virtual machine. You will need to install software such as Oracle Virtual Box in order to run the virtual machine. Please see the Workshop Software instructions for more details.
The data for these labs is provided separately in the file microbial-informatics-2014-data.tar.bz2 and can be downloaded from https://share.corefacility.ca/public.php?service=files&t=2fb62f38f4828897ca24efe8fc181a0c. This is approximetly 1.1 GB. Please download this file from within the Virtual Machine. Once downloaded, the data can be extracted to a directory, Course/ with the following command.
$ tar -xvvjf microbial-informatics-2014-data.tar.bz2
For the remainder of these labs, please adjust any references to /Course with the directory that was just extracted. For example, if the files were extacted within the Downloads directory and a command is given to copy files from /Course please copy the files from ~/Downloads/Course.
Once the virtual machine is running and the data is downloaded, the instructions for these labs can be obtained by running the following.
$ git clone https://github.com/apetkau/microbial-informatics-2014.git
This will copy all the instructions and other needed files to a directory, microbial-informatics-2014/.
Day 6: May 14, 2014 | Day 7: May 15, 2014 |
---|---|
8:45-10:15 am: Ortholog detection with OrthoMCL | 12:30-2:00 pm: Whole Genome SNP Phylogenomics |
10:30-12:15 pm: Working with GView Server | 2:15-3:15 pm: Feature Frequency Profile Phylogenies |
3:00-4:45 pm: Minimum Spanning Trees with PHYLOViZ | |