/mian

Mian is a platform for 16S rRNA OTU table visualization, data analysis, and feature selection

Primary LanguageJavaScriptMIT LicenseMIT

description
OTU table visualization, data analysis, and machine learning

Mian Overview

What is Mian?

Mian is an open-source web-platform that enables data exploration and analysis of OTU or ASV data with respect to gene expression, categorical and numerical metadata (eg. immunohistological data, experimental conditions), and taxonomic structure.

Mian gives researchers access to interactive data visualizations, machine learning models and feature selection tools, and automated statistical assessments. No code required.

What's Unique About Mian?

  • Apply linear regression or a deep-neural network on your data without any additional setup
  • Automatically find important genes or histological features
  • Track your work over time with experiment notebooks
  • Bucket quantitative measures to access categorical tools (eg. define thresholds to categorize different severities of a disease)
  • Filter by specific taxonomic groups, OTUs, or sample groups

Interactive Visualizations

Common features include detail highlights, tuneable settings, filters, notebook snapshots, and downloadable images.

  • Boxplots
    • Create boxplots of specific OTUs or taxonomic groups against categorical variables
    • Apply color according to a second categorical variable
    • Automated parametric/non-parametric statistical tests
  • Barplots or Donut Charts
    • Examine OTU composition at different taxonomic levels or categorical variables
  • Scatterplots
    • Plot OTU composition, gene expression, alpha diversity, or aggregate counts against each other
    • Color by categorical variables or resize points by quantitative variables
  • Heatmap
    • Explore correlations of OTU counts or quantitative variables
    • Visualize composition of taxonomic groups across samples
  • Correlation Network
    • Cluster correlated samples and color by categorical variables
  • Rarefaction Curves
    • Subsampling depths at varying sample sizes
  • Taxonomic Trees
    • Compare OTU counts on a taxonomic tree across different categorical variables

Boxplot visualization example with automated statistical test result

Alpha and Beta Diversity

Explore different microbial diversity measures using boxplots, scatterplots, and NMDS/PCoA plots.

  • Alpha Diversity
    • Compare species diversity measures on a boxplot or scatterplot across sample groups
    • Available measures include: Shannon, Simpson, Faith's Phylogenetic Diversity
  • Beta Diversity
    • Compare differences of diversity between different sample groups using a boxplot
    • Available measures include: Bray-Curtis, Jaccard, Sorenson, Whittaker, Weighted/Unweighted Unifrac
  • NMDS plot
    • Visualize beta diversity using a non-metric multidimensional scaling plot on different sample groups
  • PCoA plot
    • 2D or 3D principal component analysis using different distance measures and categorical variables

Feature Selection

Automatically find OTUs, taxonomic groups, or ‌metadata that can either correlate well together, or were determined to be important due to a statistical test.

  • Boruta (Random Forest) Classification
    • Find the OTUs or taxonomic groups that differentiate sample groups according to a Random Forest using the Boruta feature selection algorithm
  • Elastic Net Classification and Regression
    • Use varying regularization in Elastic Net to find OTUs or taxonomic groups that differentiate between sample groups
    • Assess important of selected features using a machine learning model
  • Fisher Exact Test
    • Apply a presence-absence test on OTUs or taxonomic groups across different sample groups
  • Differential Selection
    • Look for OTUs or taxonomic groups that differentiate between two sample groups according to a statistical test and corresponding FDR-corrected q-value
  • Correlation Selection
    • Look for genes, OTUs or taxonomic groups, or quantitative metadata that correlate with another gene , OTU or taxonomic group, quantitative metadata, or alpha diversity according to a statistical test and corresponding FDR-corrected q-value

Machine Learning

Assess the ability of your dataset to generalize using a machine learning model by creating a classifier or regressor on a training dataset and evaluating on a test dataset.

  • Linear Regressor and Classifier
    • Choose an experimental variable to train and test the ability of your linear model (with regularization) to predict the variable
  • Random Forest Classifier
    • Choose an categorical variable to train and test the ability of a random forest model to predict the variable
  • Deep Learning
    • Design a deep neural network with dropout to assess the ability of the model to predict a chosen variable

Libraries Used

Mian is built using HTML/CSS/JavaScript for the front-end and Python and R for the back-end.

  • Bootstrap 3: CSS styling
  • jQuery: Front-end dynamic website interaction handling
  • Flask: Python-based back-end server
  • flask-login: Enables login capabilities for the Mian website
  • biom-format: Allows processing of user-uploaded standard biological matrix format (BIOM)
  • h5py: Allows processing of HDF5 formatted files
  • rpy2: Interface between Python and R code
  • scikit-learn: Machine learning algorithms implemented in Python
  • scipy: Python library used for scientific processing
  • werkzeug: WSGI application for Python
  • scikit-bio: Python library used for bioinformatics
  • pandas: Provides high-performance data structure and matrix manipulation
  • vegan: R library for community ecology analysis
  • RColorBrewer: Color maps for R
  • ranger: Fast random forest implementation
  • Boruta: All-relevant feature selection algorithm
  • glmnet: Lasso and elastic-net regularized linear models

Publications

Mian is currently described in a pre-print paper in bioRxiv: https://www.biorxiv.org/content/early/2018/09/14/416073

Support

Mian is built at the University of British Columbia and supported by the Providence Health Care Research Institute and Centre for Heart Lung Innovation at St. Paul's Hospital