This study performed Random Forest regression analyses of human microbiota from multiple body sites (gut, mouth and skin). This repository provided all source data and codes for generation of all results in the manuscript. Furthermore, in output directories, we also provided additional exploratory analysis results for a better understanding of our microbiota-based models for age prediction.
- Gut microbiota:
QIITA Study ID | EBI accession ID | Project name | Publication(s) | # of samples involved |
---|---|---|---|---|
10317 | ERP012803 | American Gut Project | American Gut: an Open Platform for Citizen Science Microbiome Research | 2770 |
11757 | PRJEB18535 | GGMP regional variation | Regional variation greatly limits application of healthy gut microbiome reference ranges and disease models | 1609 |
- Oral microbiota:
- Skin microbiota:
QIITA Study ID | EBI accession ID | Project name | Publication(s) | # of samples involved |
---|---|---|---|---|
10317 | ERP012803 | American Gut Project | American Gut: an Open Platform for Citizen Science Microbiome Research | 440 |
11052 | ERP021896 | Knight_ABTX | NA | 177 |
2010 | ERP012216 | Longitudinal babies project | Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer | 65 |
1841 | PRJEB5726, PRJEB5727, PRJEB5728 | Flores_SMP | Temporal variability is a personalized feature of the human microbiome | 1293 |
Although the skewed age distribution in the skin or oral microbiota dataset may decrease the accuracy of age prediction for the older adults, it will not affect the conclusions about the relative ability of different human microbiomes to predict age.
There are some R scripts and files in this repository that were used in the process of preparing the manuscript, also. Here I'll try to explain some of these.
This meta-analysis depends on the self-developed R package crossRanger
that can be downloaded as following.
## install.packages('devtools') # if devtools not installed
devtools::install_github('shihuang047/crossRanger')
The R script Age.crossRF_reg.ranger.R
performs the meta-analysis of microbiota data for predicting chronological age. For each dataset (i.e. gut, mouth or skin), this script can perform analyses as following.
- Data trimming (such as sample filtering by NA values in the metadata).
- RF modeling and performance evaluation for the whole dataset.
- RF modeling and performance evaluation for the sub-datasets. To test if confounders (such as sex) affected the modeling, we first trained the age model within a sub-dataset stratified by a confounder, then applied it on all the other sub-datasets. For both model training and testing, we evaluated regression performance using mean absolute error (MAE).
- Cross-application of RF models built on the sub-datasets and evaluated the performance using MAE.
All the anaylses can be conducted with this script typically in the Rstudio or R concole.
Input | gut_data | oral_data | skin_data | Description |
---|---|---|---|---|
datafile |
gut_data/gut_4434.biom | oral_data/oral_4014.biom | skin_data/skin_4168.biom | Biom-table file |
sample_metadata |
gut_data/gut_4434_map.txt | oral_data/oral_2550_map.txt | skin_data/skin_1975_map.txt | Metadata file |
feature_metadata |
gut_data/gut_taxonomy.txt | oral_data/oral_taxonomy.txt | skin_data/skin_taxonomy.txt | Feature metadata file |
prefix_name |
gut_4434 | oral_2550 | skin_1975 | The prefix of datasets |
s_category |
c("cohort", "sex") | "qiita_host_sex" | c("body_site","qiita_host_sex") | The metadata category for dividing datasets |
c_category |
"age" | "qiita_host_age" | "qiita_host_age" | The targeted metadata category for RF modeling |
This folder includes all the input files (biom table, sample metadata and feature metadata files) necessary for the RF regression analysis.
This folder contains all of the output files from the main R script Age.crossRF_reg.ranger.R
.
This folder contains selected output figures from the Output
folder to genenrate the formal figures in our manuscript.
This work is supported by IBM Research AI through the AI Horizons Network. For more information visit the IBM AI Horizons Network website.