/microeco

An R package for data analysis in microbial community ecology

Primary LanguageRGNU General Public License v3.0GPL-3.0

microeco

An R package for data mining in microbial community ecology

CRAN CRAN

Background

In microbial community ecology, with the development of high-throughput sequencing techniques, the increasing data amount and complexity make the community data analysis and management a challenge. There has been a lot of R packages created for the microbiome profiling analysis. However, it is still difficult to perform data mining fast and efficiently. Therefore, we created R microeco package.

Main Features

  • R6 Class to store and analyze data; fast, flexible and modularized
  • Taxonomic abundance analysis
  • Venn diagram
  • Alpha diversity
  • Beta diversity
  • Differential abundance analysis
  • Indicator species analysis
  • Environmental data analysis
  • Null model analysis
  • Network analysis
  • Functional analysis

Installing R/RStudio

If you do not already have R/RStudio installed, do as follows.

  1. Install R
  2. Install RStudio

Put R in the computer env PATH, for example your_directory\R-4.0.0\bin\x64

Open RStudio...Tools...Global Options...Packages, select the appropriate mirror in Primary CRAN repository.

Install microeco

Install microeco package from CRAN directly.

install.packages("microeco")

Or install the latest development version from github.

# If devtools package is not installed, first install it
install.packages("devtools")
# then install microeco
devtools::install_github("ChiLiubio/microeco")

Tutorial

See the detailed package tutorial (https://chiliubio.github.io/microeco_tutorial/) and the help documentations (e.g. ?microtable). If you want to run the codes in the tutorial completely, you need to install some additional packages. Please see the following Notes part. Contructing the basic microtable object from other tools/platforms (e.g. QIIME, QIIME2, HUMAnN and phyloseq) can be easily achieved with the package file2meco (https://github.com/ChiLiubio/file2meco). The mecodev package (https://github.com/ChiLiubio/mecodev/) is designed to develop more classes for data analysis based on the microeco package.

Citation

Chi Liu, Yaoming Cui, Xiangzhen Li and Minjie Yao. 2021. microeco: an R package for data mining in microbial community ecology. FEMS Microbiology Ecology, 97(2): fiaa255. https://doi.org/10.1093/femsec/fiaa255

Notes

packages important

To keep the start and use of microeco package simplified, the installation of microeco only depend on several packages, which are compulsory-installed from CRAN and important in the data analysis. So the question is that you may encounter an error when using a class or function that invoke an additional package like this:

library(microeco)
data(dataset)
t1 <- trans_network$new(dataset = dataset, cal_cor = NA, taxa_level = "OTU", filter_thres = 0.0005)
t1$cal_network(network_method = "SpiecEasi")
Error in t1$cal_network(network_method = "SpiecEasi"): igraph package not installed ...

The reason is that network construction require igraph package. We donot put the igraph and some other packages (e.g. SpiecEasi in github) on the "Imports" part of microeco package.

The solutions:

  1. install the package when encounter such an error. Actually, it's very easy to install the packages from CRAN or bioconductor. Just try it.

  2. install the packages in advance. We recommend this solution if you are interest in most of the methods in the microeco package and want to repeat the analysis in tutorial.

We show several packages that are published in CRAN and not installed automatically.

Package where description
reshape2 microtable class data transformation
MASS trans_diff$new(method = "lefse",…) linear discriminant analysis
GUniFrac cal_betadiv() UniFrac distance matrix
ggpubr plot_alpha() some plotting functions
randomForest trans_diff$new(method = "rf",…) random forest analysis
ggdendro plot_clustering() plotting clustering dendrogram
ggrepel trans_rda class reduce the text overlap in the plot
agricolae cal_diff(method = anova) multiple comparisons
gridExtra trans_diff class merge plots
picante cal_alphadiv() Faith’s phylogenetic alpha diversity
pheatmap plot_corr(pheatmap = TRUE) correlation heatmap with clustering dendrogram
tidytree trans_diff class plot the taxonomic tree
igraph trans_network class network related operations
rgexf save_network save network with gexf style
ggalluvial plot_bar(use_alluvium = TRUE) alluvial plot

Then, if you want to install these packages or some of them, you can do like this:

# If a package is not installed, it will be installed from CRAN.
# First select the packages of interest
packages <- c("reshape2", "MASS", "GUniFrac", "ggpubr", "randomForest", "ggdendro", "ggrepel", "agricolae", "gridExtra", "picante", "pheatmap", "igraph", "rgexf", "ggalluvial")
# Now check or install
lapply(packages, function(x) {
	if(!require(x, character.only = TRUE)) {
		install.packages(x, dependencies = TRUE)
	}})

There are also some packages that are useful in some functions. These packages may be R packages published in github or bioconductor, or packages written by other languages.

ggtree

Plotting the cladogram from the LEfSe result requires the ggtree package in bioconductor (https://bioconductor.org/packages/release/bioc/html/ggtree.html).

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("ggtree")

SpiecEasi

The R package SpiecEasi can be used for the network construction using SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference) approach. The package can be installed from Github https://github.com/zdk123/SpiecEasi

Gephi

Gephi is an excellent network visualization tool and used to open the saved network file, i.e. network.gexf in the tutorial. You can download Gephi and learn how to use it from https://gephi.org/users/download/

WGCNA

In the correlation-based network, when the species number is very large, the correlation algorithm in WGCNA is very fast compared to the 'cor' option in trans_network.

install.packages("WGCNA", dependencies = TRUE)

Tax4Fun

Tax4Fun is an R package used for the prediction of functional potential of prokaryotic communities.

  1. install Tax4Fun package
install.packages("RJSONIO")
install.packages(system.file("extdata", "biom_0.3.12.tar.gz", package="microeco"), repos = NULL, type = "source")
install.packages(system.file("extdata", "qiimer_0.9.4.tar.gz", package="microeco"), repos = NULL, type = "source")
install.packages(system.file("extdata", "Tax4Fun_0.3.1.tar.gz", package="microeco"), repos = NULL, type = "source")
  1. download SILVA123 reference data from http://tax4fun.gobics.de/ unzip SILVA123.zip , move it to a place that you can remember.

Tax4Fun2

Tax4Fun2 is another R package for the the prediction of functional profiles and functional gene redundancies of prokaryotic communities. It has higher accuracies than PICRUSt and Tax4Fun. The Tax4Fun2 approach implemented in microeco is a little different from the original package. Using Tax4Fun2 approach require the representative fasta file. The user do not need to install Tax4Fun2 R package. The only thing need to do is to download the blast tool and Ref99NR/Ref100NR database. Downlaod blast tools from "ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+" ; e.g. ncbi-blast-****-x64-win64.tar.gz for windows system. Downlaod Ref99NR.zip from "https://cloudstor.aarnet.edu.au/plus/s/DkoZIyZpMNbrzSw/download" or Ref100NR.zip from "https://cloudstor.aarnet.edu.au/plus/s/jIByczak9ZAFUB4/download" . Uncompress all the folders. The final folders should be like these structures:

blast tools:
|-- ncbi-blast-2.11.0+
|---- bin
|------ blastn.exe
|------ makeblastdb.exe
|------ ......

Ref99NR/Ref100NR:
|-- Tax4Fun2_ReferenceData_v2
|---- Ref99NR
|------ otu000001.tbl.gz
|------ ......
|------ Ref99NR.fasta
|------ Ref99NR.tre

The path "ncbi-blast-2.11.0+/bin" and "Tax4Fun2_ReferenceData_v2" will be required in the trans_func$cal_tax4fun2() function.

# seqinr should be installed for reading and writing fasta file
install.packages("seqinr", dependencies = TRUE)
# Now we show how to read the fasta file
# see https://github.com/ChiLiubio/file2meco if you do not have installed file2meco
rep_fasta_path <- system.file("extdata", "rep.fna", package="file2meco")
rep_fasta <- seqinr::read.fasta(rep_fasta_path)
# then see the help document of microtable class about the rep_fasta in microtable$new().

Plotting

Most of the plotting in the package rely on the ggplot2 package system. We provide some parameters to change the corresponding plot. If you want to modify the output plot, you can also assign the output a name and use the ggplot2-style grammer to modify it as you need. Each data table used for plotting is stored in the object and can be downloaded for the personalized analysis and plotting. Of course, you can also directly modify the class and reload them.

Files from other tools to microtable object

Previous descriptions on how to construct microtable object from QIIME, QIIME2 and phyloseq have been moved to the package file2meco (https://github.com/ChiLiubio/file2meco) The package file2meco is designed to convert files between other tools/platforms and microtable object.

Contributing

We welcome any contribution, including but not limited to code, idea and tutorial. ! Please report errors and questions on github Issues. Any contribution via Pull requests or Email(liuchi0426@126.com) will be appreciated. By participating in this project you agree to abide by the terms outlined in the Contributor Code of Conduct.

References

  • Louca, S., Parfrey, L. W., & Doebeli, M. (2016). Decoupling function and taxonomy in the global ocean microbiome. Science, 353(6305), 1272. DOI: 10.1126/science.aaf4507
  • Nguyen, N. H., Song, Z., Bates, S. T., Branco, S., Tedersoo, L., Menke, J., … Kennedy, P. G. (2016). FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology, 20(1), 241–248.
  • Põlme, S., Abarenkov, K., Henrik Nilsson, R. et al. FungalTraits: a user-friendly traits database of fungi and fungus-like stramenopiles. Fungal Diversity 105, 1–16 (2020). DOI: 10.1007/s13225-020-00466-2
  • Aßhauer, K. P., Wemheuer, B., Daniel, R., & Meinicke, P. (2015). Tax4Fun: Predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics, 31(17), 2882–2884.
  • Wemheuer, F., Taylor, J.A., Daniel, R. et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environmental Microbiome 15, 11 (2020). DOI: 10.1186/s40793-020-00358-7
  • Liu, C., Yao, M., Stegen, J. C., Rui, J., Li, J., & Li, X. (2017). Long-term nitrogen addition affects the phylogenetic turnover of soil microbial community responding to moisture pulse. Scientific Reports, 7(1), 17492.
  • Segata, N., Izard, J., Waldron, L., Gevers, D., Miropolsky, L., Garrett, W. S., & Huttenhower, C. (2011). Metagenomic biomarker discovery and explanation. Genome Biology, 12(6), R60.
  • Chi Liu, Yaoming Cui, Xiangzhen Li, Minjie Yao, microeco: an R package for data mining in microbial community ecology, FEMS Microbiology Ecology, Volume 97, Issue 2, February 2021, fiaa255.
  • An, J., Liu, C., Wang, Q., Yao, M., Rui, J., Zhang, S., & Li, X. (2019). Soil bacterial community structure in Chinese wetlands. Geoderma, 337, 290–299.
  • Tackmann, J., Matias Rodrigues, J. F., & Mering, C. von. (2019). Rapid inference of direct interactions in large-scale ecological networks from heterogeneous microbial sequencing data. Cell Systems, 9(3), 286–296 e8.
  • White, J., Nagarajan, N., & Pop, M. (2009). Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Computational Biology, 5(4), e1000352.
  • Kurtz ZD, Muller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015; 11: e1004226.
  • McMurdie PJ, Holmes S (2013) phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLOS ONE 8(4): e61217.
  • Paulson, J., Stine, O., Bravo, H. et al. Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10, 1200–1202 (2013). DOI: 10.1038/nmeth.2658
  • Deng Y, Jiang Y-H, Yang Y, He Z, Luo F, Zhou J. Molecular ecological network analyses. BMC bioinformatics 2012; 13: 113.
  • Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. Vegan: Community ecology package. 2019.
  • Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010; 26: 1463–1464.