The aim of this all-day workshop is to familiarize you with the Microsoft Azure public cloud and some of its Big Data and Machine Learning technologies.
By the time you leave the workshop, you should know the following:
- How to navigate Azure and how to spin up new resources to do your work
- Know different data collection & sanitation techniques and technologies
- Have a feel for implementation of machine learning platforms that will facilitate your research aspirations
- How to have fun working with scaled, (almost) unlimited cloud resources
** IMPORTANT: Navigate over to the Pre-Reqs for ML Day before attempting the labs **
- 0830-0930: This session gives a quick overview of Azure, how we will interacting with the different technologies of the day, and which Big Data and Machine Learning technologies we will be using.
-
0930-1030: We will start our first lab with looking at methods on how to collect large data sets for analysis. We'll start with an IoT simulator to simulate large data sets coming into your Azure subscription. In the following sessions, we'll learn how to sanitize and then analyze these large amounts of data
-
NOTE: When you get to the prompt that says Activate Sandbox, you may skip that as you already have a subscription and resource group provisioned.
-
NOTE: Only one Free Tier is allowed per subscription. Use "S1" when prompted for the tier.
-
1045-1145: Our second lab will analyze and sanitize a large amount of data similar to what we generated in Lab 1. Our objective is to visualize this large dataset, sanitize it and then infer predictions based on the previously collected data using a ML algorithm.
Session 4: HDInsight (Hadoop Made Easy) We Didn't get around to this session. Feel free to execute on your own!
-
1230-1330: Our third lab creates a HIVE table in Hadoop, one of the most prominent open-source big data & analytics engine, and demonstrates how to manipulate large amounts of data -- in this case weblogs -- and make sense of it with minimal effort
-
1330-1430: This lab will setup a Databricks workspace. Databricks a hosted Apache Spark environment that is simplified so that minimal infrastructure knowledge is needed to leverage the platform in order to employ Spark Machine Learning or Big Data notebooks.
-
Databricks Setup and Lab 1 Hands-On Lab
- Exercise 1 and Exercise 2
-
1445-1600: In Exercise 3, we'll use the opensource TensorFlow ML library to analyze the data even further by using deploying a simple Deep Neural Network which will classify claims data.
If we have time, Exercise 4 leverages pre-built, compiled and inexpensive public Azure services to analyze the text with Microsoft's Text Analytics API which is part of the Cognitive Services toolkit. These services can be leverage in ANY code, anywhere securely so long as the code has access to the internet.
-
Azure Databricks + TensorFlow Hands-On Lab
- Exercise 3 & 4 (if you have time)
Thanks for spending time with Microsoft today. We're ALWAYS thrilled to come and speak about our technology and we'd love to hear back from you! Please see below for our contact information and survey.
- Email: joey.brakefield@microsoft.com
- Twitter: http://twitter.com/kfprugger
- LinkedIn: https://www.linkedin.com/in/joeybrakefield/
- GitHub: https://github.com/kfprugger
- Email: shimail.gillani@microsoft.com
- Twitter: https://twitter.com/chefgillani
- LinkedIn: https://www.linkedin.com/in/chefgillani/
- GitHub: https://github.com/ChefGillani