Growth Rate Index (GRiD) measures bacterial growth rate from reference genomes (including draft quality genomes) and metagenomic bins at ultra-low sequencing coverage (> 0.2x). GRiD is described in
Emiola and Oh (2018) "High throughput in situ metagenomic measurement of bacterial replication at ultra-low sequencing coverage." Nature Communications (https://www.nature.com/articles/s41467-018-07240-8).
GRiD algorithm consists of two modules;
- < single > - which is applicable for growth analysis involving a single reference genome
- < multiplex > - for the high-throughput growth analysis of all identified bacteria in a sample. Prior knowledge of microbial composition is not required. To use this module, you must download the GRiD database.
The comprehensive GRiD database consists of 32,819 representative bacteria genomes and can be obtained from ftp://ftp.jax.org/ohlab/Index/.
However, to reduce runtime, we provided environment-specific database that was created using microbes mostly found in a specific microbial niche. This can be retrieved from ftp://ftp.jax.org/ohlab/GRiD_environ_specific_database/. For instance, if you are analyzing stool samples, it is advisable to download and extract the stool database ftp://ftp.jax.org/ohlab/GRiD_environ_specific_database/stool_microbes.tar.gz
.
This version is a mere adaptation that uses docker. The advantages, there is no need to install any dependencies just downloading the image should be fine. To run it just do this:
docker run --rm -it -v $PWD:/data/ gaarangoa/grid:1.0 /bin/bash
Make sure to add the --volumes tag so you can access to your data from the hosting machine! and this is it!
Thanks to Emiola and Oh (2018) for developing this tool!
(see Example to know how to use docker images!)
grid single <options> GRiD using a single genome
grid multiplex <options> GRiD high throughput
grid -v Version
grid -h Display this help message
grid single <options>
<options>
-r Reads directory (single end reads)
-o Output directory
-g Reference genome (fasta)
-l Path to file listing a subset of reads
for analysis [default = analyze all samples in reads directory]
-n INT Number of threads [default 1]
-h Display this message
grid multiplex <options>
<options>
-r Reads directory (single end reads)
-o Output directory
-d GRiD database directory
-c FLOAT Coverage cutoff (>= 0.2) [default 1]
-p Enable reassignment of ambiguous reads using Pathoscope2
-t INT Theta prior for reads reassignment [default 0]. Requires the -p flag
-l Path to file listing a subset of reads
for analysis [default = analyze all samples in reads directory]
-m merge output tables into a single matrix file
-n INT Number of threads [default 1]
-h Display this message
NOTE: Sample reads must be in single-end format. If reads are only available in paired-end format, use either of the mate pairs, or concatenate both pairs into a single fastq file. In addition, reads must have the .fastq extension (and not .fq). Also, when specifying the output folder using the -o flag, either select a non-existing folder (in this case, GRiD would create the folder) or select an existing but empty folder.
In both 'single' and 'multiplex' modules, all samples present in the reads directory would be analyzed by default. However, analysis can be restricted to a subset of samples by using the -l flag and specifying a file that lists the subset of samples.
For the 'multiplex' module, reads mapping to multiple genomes are reassigned using Pathoscope 2 when the -p flag is set. The degree to which reads are reassigned is set by the -t (theta prior) flag. The theta prior value represents the number of non-unique reads that are not subject to reassignment. Finally, when the coverage cutoff (-c flag) is set below 1, only genomes with fragmentation levels below 90 fragments/Mbp are analyzed (see Emiola & Oh, 2018 for more details). Note that to use the 'multiplex' module, you must have downloaded either the environ_specific_database from [ftp://ftp.jax.org/ohlab/GRiD_environ_specific_database/]
or comprehensive database [ftp://ftp.jax.org/ohlab/Index/]
. However, you do not need the database to run the example test.
single module
- two output files are generated
- A plot (.pdf) showing coverage information across the genome
- A table of results (.txt) displaying growth rate score (GRiD), 95% confidence interval, unrefined GRiD value, species heterogeneity, genome coverage, dnaA/ori ratio, and ter/dif ratio. Species heterogeneity is a metric estimating the degree to which closely related strains/species contribute to variance in growth predictions (range between 0 - 1 where 0 indicate no heterogeneity). In most bacteria genomes, dnaA is located in close proximity to the ori whereas replication typically terminates at/near dif sequence. Thus, the closer dnaA/ori and ter/dif ratios are to one, the more likely the accuracy of GRiD scores.
multiplex module
- two output files are generated per sample
- A table of results (.txt) displaying growth rate score (GRiD) for ALL genomes above the coverage cutoff, unrefined GRiD value, species heterogeneity, and genome coverage. If -m flag is set, all tables will be merged into a single matrix file called "merged_table.txt".
- A heatmap (.pdf), displaying growth rate score (GRiD) for the 70 most abundant species above the coverage cutoff with hierachical clustering.
Some notes to keep in mind when using the 'multiplex' module to enhance performace
- If possible, use the smaller
environ_specific_database
GRiD database. - Subsample very large files to considerably reduce runtime.
- Whenever possible, run your analyses with and without the Pathoscope reassignment (-p) option and compare results. You can also fine-tune the reads reassignment parameters using -t flag.
- Also, you may want to filter your results based on
Species heterogeneity
. Typically, growth estimates with 'Species heterogeneity' values < 0.3 are very reliable.
The test sample contain reads from Staphylococcus epidermids, Lactobacillus gasseri, and Campylobacter upsaliensis, each with coverage of ~ 0.5. Download the GRiD folder and run the test as shown below.
If you created a conda environment prior to installation using the example in the "Installation" section, then always activate the environment prior to running GRiD using the command source activate GRiD
. Otherwise, skip this step.
Access your docker container, make sure to add the volume to where you have your data!
docker run --rm -it -v $PWD:/data/ gaarangoa/grid:1.0 /bin/bash
cd /data/
Now you can run the example!
wget https://github.com/ohlab/GRiD/archive/1.2.tar.gz
tar xvf 1.2.tar.gz
cd GRiD-1.2/test
grid single -r . -g S_epidermidis.LRKNS118.fna -o output_single
grid multiplex -r . -d . -p -c 0.2 -o output_multiplex -n 16
(This command tells GRiD to reassign ambiguous reads, calculate genomes with coverage > 0.2, and use 16 threads)
For each module, output files (a pdf and a text file) are generated in the test folder.
The database can be updated with metagenomic bins or newly sequenced bacterial genomes by running the update_database script.
update_database <options>
<options>
-d GRiD database directory (required)
-g Bacterial genomes directory (required)
-p Prefix for new database (required)
-l Path to file listing specific genomes
for inclusion in database [default = include all genomes in directory]
-h Display this message
NOTE: Genomes must be in fasta format and must have either .fasta, .fa or .fna extensions.