Distributed Services for Machine Learning - Dan Zaratsian, March 2023
- Intro and Module Agenda
- Trends in AI/ML
- Overview of Tools and Services
- ML Architectures
- Google Colab Notebook Environment
- Google BigQuery Sandbox
- Intro to Apache SparkSQL
- Apache SparkSQL
- BigQuery (Serverless SQL)
- Google Cloud Firestore (NoSQL)
Assignments
-
- Due on Wednesday, March 15 by 11:59pm EST
- Please complete as an individual assignment
- Email your code and answers to d.zaratsian@gmail.com
-
- Due on Wednesday, March 15 by 11:59pm EST
- Please complete as an individual assignment
- No need to email your code for assignment #2 unless you want specific code / syntax feedback. I'll be able to see the submitted results within the Firestore DB.
- Apache Spark Overview
- Spark Machine Learning (MLlib)
- ML Pipelines
- Building and deploying Spark machine learning models
- Considerations for ML in distributed environments
- Spark Best Practices and Tuning
- Spark Code Walk-through (within Google Colab)
Assignment
- Assignment 3 - SparkML
- Due on Tuesday, March 21 by 11:59pm
- Please complete as an individual assignment
- Email your code to d.zaratsian@gmail.com
- Overview of Google Cloud Machine Learning Services
- AutoML
- BigQueryML
- Google Vertex AI Platform
- Google Vertex Notebooks (Workbench)
- Google Deep Learning Containers
Slides (Slides will be live by March 20th)
- Apache Kafka
- Google PubSub
- Spark Structured Streaming and Spark Streaming
- Apache Beam and Google Dataflow
Slides (Slides will be live by March 22th)
- Overview of Google Cloud Services
- Cloud Functions
- Cloud Run
- Docker
- Google Colab Notebooks
- Google Vertex AI Platform
- Google Vertex Notebooks (Workbench)
- Apache Zeppelin
- Apache Spark Docs
- Google BigQuery
- Google BigQuery Sandbox
- Apache Hive Docs
- Google Cloud Firestore
- Apache HBase Docs
- Apache Phoenix Docs
- Google Cloud PubSub
- Apache Kafka Docs
- Apache NiFi Docs
- Docker Docs
- Google Deep Learning Containers