/r-on-azure

Compilation of R packages and tools for doing data science and AI on Azure cloud

R on Azure

I noticed that though this is site has not been activey maintained by myself for a while there are still people starred/forked the repository. Gladly the repository content provides help to these people. I will keep an eye on the repository - for all the readers and/or users of the repository, please feel free to make any contributions to update the listed resources and references whenever you feel necessary. Thanks!

Development of data science and AI becomes easier than ever before thanks to cloud computing. The Github repo site collects a set of R packages, tools, and case-studies for doing R data science on Azure cloud.

R packages and tools

These packages and tools are categoried into four groups, representing four typical tasks data scientists or AI developers may frequently work on.

Category Features
Cloud resource operation and administration Simplify the way to interact with Azure cloud platform and operate resouces on Azure for various tasks.
Scalable and advanced analytics Enable large-scale and parallel data analytics in R environment.
Remote interaction and access to Cloud instance Enhance work efficiency on cloud for R based analytics.
Application and service deployment Make operationalizing solution and deploying it as service easy.

Cloud resource operation and administration

R packages and tools in this category are featured by offering a simplified way to interact with Azure cloud platform and operate resouces (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.

  • AzureSMR - R package for managing a selection of Azure resources. Targeted at Data Scientists who need to control Azure Resources without needing to both Administrators. APIs include Storage Blobs, HDInsight(Nodes, Hive, Spark), ARM, VMs.
  • AzureDSVM - R package that offers convenient harness of Azure DSVM, remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
  • doAzureParallel - R package that allows users to submit parallel workloads in Azure.
  • rAzureBatch - a HTTP proxy library written in R for Azure.
  • AzureML - an R interface to AzureML experiments, datasets, and web services.
  • AzureR - Family of packages for interacting with Azure from R

Scalable and advanced analytics

R packages and tools in this category allow one to performan large-scale R-based analytics on cloud with the bleeding-edge frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, Keras, etc. NOTE: many of the tools are pre-installed and configured for direct use on Azure Data Science Virtual Machine.

Scalable analytics

  • dplyrXdf - a dplyr backend for Revolution Analytics xdf files.
  • sparklyr - R interface for Apache Spark.
  • SparkR - SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

Deep learning

  • CNTK-R - R bindings to the CNTK library.
  • tensorflow - R interface to Tensorflow.
  • mxnet - The MXNet R package brings flexible and efficient GPU computing and state-of-art deep learning to R.
  • keras - R interface to Keras.
  • darch - Create deep architectures in R.
  • deepnet - Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
  • gpuR - R interface to use GPU.

Compositive

  • RevoScaleR - a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale.
  • MicrosoftML - a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
  • h2o - R interface to H2O.

Interaction and remote access

The R packages and tools in this category help data scientists or developers to easily remote access or interact with Azure cloud instances or services for convenient development.

  • mrsdeploy - an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
  • R Tools for Visual Studio - IDE with R support.
  • RStudio Server - IDE for remote R session with access via Internet browser.
  • JupterHub - Jupyter notebook with multi-user access.
  • IRKernel - R kernel for Jupyter notebook.

Application and service deployment

The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.

  • mrsdeploy - an R package that provides functions for deploying easily-consumable service within R session.
  • AzureML - an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
  • Azure Container Instances - Azure service to allow running a containerized R analytics on cloud.
  • Azure Container Services - Azure service that simplifies deployment, management, and operation of orchestrated containers of R analytics.
  • Shiny server - Develop and publish Shiny based web applications online.

Real-world use cases

The real-world use cases below show case Azure cloud-based analytical solutions that involve the aforementioned R packages or tools.

Use case Key R packages or tools
Campaign management RevoScaleR, RTVS/RStudio
Customer churn prediction RevoScaleR, MicrosoftML, RTVS/RStudio
Energy demand forecasting RevoScaleR, MicrosoftML, RTVS/RStudio
Fraud detection RevoScaleR, RTVS/RStudio
Galaxies classification RevoScaleR, mrsdeploy, MicrosoftML, RTVS/RStudio
Performance test tuning RevoScaleR, RTVS/RStudio
Predictive maintenance RevoScaleR, RTVS/RStudio
Retail forecasting RevoScaleR, RTVS/RStudio
Credit risk scoring MicrosoftML, mrsdeploy, Shiny, RTVS/RStudio
Drop-out prediction MicrosoftML, Jupyter Notebook
Product demand forecasting RevoScaleR, RTVS/RStudio
Solar panel forecasting AzureSMR, AzureDSVM, keras, RTVS/RStudio
Employee attrition prediction AzureSMR, AzureDSVM, Azure Container Services, Shiny, RTVS/RStudio
Flight delay prediction AzureSMR, AzureDSVM, MicrosoftML, SparkR, RTVS/RStudio
Monte Carlo price simulation doAzureParallel, RTVS/RStudio