/iaa-2023

Institute for Advanced Analytics, 2023

Primary LanguageJupyter Notebook

Distributed Services for Machine Learning - Dan Zaratsian, March 2023


IAA Module - Session 1 - Distributed Services and Platform Overview

Slides


IAA Module - Session 2 - SQL and NoSQL Services

Slides

  • Intro to Apache SparkSQL
  • Apache SparkSQL
  • BigQuery (Serverless SQL)
  • Google Cloud Firestore (NoSQL)

Assignments

  • Assignment 1 SQL

    • Due on Wednesday, March 15 by 11:59pm EST
    • Please complete as an individual assignment
    • Email your code and answers to d.zaratsian@gmail.com
  • Assignment 2 NoSQL

    • Due on Wednesday, March 15 by 11:59pm EST
    • Please complete as an individual assignment
    • No need to email your code for assignment #2 unless you want specific code / syntax feedback. I'll be able to see the submitted results within the Firestore DB.

IAA Module - Session 3 - Spark Data Processing & Machine Learning

Slides

  • Apache Spark Overview
  • Spark Machine Learning (MLlib)
  • ML Pipelines
  • Building and deploying Spark machine learning models
  • Considerations for ML in distributed environments
  • Spark Best Practices and Tuning
  • Spark Code Walk-through (within Google Colab)

Assignment


IAA Module - Session 4 - Cloud Machine Learning

Slides


IAA Module - Session 5 - Realtime, Streaming Systems

Slides (Slides will be live by March 20th)


IAA Module - Session 6 - Cloud Services & Serverless

Slides (Slides will be live by March 22th)


References: