This repository contains resources for the talk "MLOPS with R: An end-to-end process for building machine learning applications".
In addition to the slides (see below), this repository contains the complete set of code and GitHub Actions to deploy a Shiny application for calculating the probability of a fatal road accident. See below for instructions on how to deploy this application yourself.
As predictive models and machine learning become key components of production applications in every industry, an end-to-end Machine Learning Operations (MLOPS) process becomes critical for reliable and efficient deployment of applications that depend on R-based models. In this talk, I’ll outline the basics of the DevOps process and focus on the areas where MLOPS diverges. The talk will show the complete process of building and deploying an application driven by a machine learning model implemented with R. We will show the process of developing models, triggering model training on code changes, and triggering the CI/CD process for an application when a new version of a model is registered. We will use the Azure Machine Learning service and the “azuremlsdk” package to orchestrate the model training and management process, but the principles will apply to MLOPS processes generally, especially for applications that involve large amounts of data or require significant computing resources.
Aug 2020: New York R Conference (online).
MLOPS with R: An end-to-end process for building machine learning applications: slides (PDF) | Video Recording (forthcoming)
Links and other useful resources from the talk.
Azure Machine Learning service:
- Documentation
- Free azure credits: register here. (Credit card required, but won't be charged until you remove limits to allow it.)
azuremlsdk R package:
- CRAN
- GitHub Repository
- Documentation.
- Tutorial: Create a logistic regression model in R with Azure Machine Learning
GitHub Actions:
- Documentation
- An Unintentionally Comprehensive Introduction to GitHub Actions CI
- ML Ops with GitHub Actions and Azure Machine Learning
- GitHub Actions for the R Language
Visual Studio Code:
Data file nassCDS.csv
:
- The app uses data from the US National Highway Traffic Safety Administration (with thanks to Mary C. Meyer and Tremika Finney). This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality.
Machine Learning Operations with R (January, 2020)
The application runs as a Shiny app, running on an instance of the Azure Data Science VM. Azure ML service is used to train and deploy the scoring endpoint from R scripts, and GitHub Actions orchestrates the app deployment.
-
Fork this repository.
-
Follow the directions in ML Ops with GitHub Actions and Azure Machine Learning to:
- Create a resource group in your Azure subscription. (If you don't have one, create an Azure Free Subscription and get $200 in free Azure credits.)
- Create a service principal
- Add secrets to your forked repository
- Configure the
.cloud\.azure\workspace.json
file. You can use an existing Azure ML Workspace, or if none by the specified name exists it will be created for you.
-
Deploy an Azure Data Science Virtual Machine and configure it as the Shiny Server by following these instructions.