INFO9023 - Machine Learning Systems Design

The course of Machine Learning System Designs (or MLOps) aims at enabling students to truly build fully functional "user-ready" ML systems. It will look at the whole lifecycle of building a real world ML application. At the end of the course, students will be familiar with key tools and frameworks of MLOps.

We are in the early days of the AI revolution, which is expected to bring a large impact to many industries. There is a large demand for skilled engineers who are able to build ML systems.

Bringing a Machine Learning application to production requires many more efforts than solely the ML model development. Famously, there is a large hidden technical debts in designing and implementing all the components coming around your model.

This course and material contained in this repo aims at closing that technical debt by providing essential tools and best practices in MLOps.

Logistic

Teaching Staff:
- Thomas Vrancken (Instructor): t.vrancken@uliege.be
- Matthias Pirlet (Teaching Assistant): Matthias.Pirlet@uliege.be
Class time & place: Classes are held from 9:00 to 12:30 every Monday in B37 S.42 (-1/42) [Liège Sart-Tilman - Polytech]
- Note that it was changed from the original room B28 R.75 (0/75) as we were a too large group for it.
- The new room is not available on the 04/03/2024. That day we will be back in room B28 R.75 (0/75).- Support: You can ask any question by email, during (or after) classes or in person during open office hours every Monday till 18:00 in office 77B (building B28).
Communication: Main communication point will be Discord (link). Official communication might also come by email.

Agenda

Note that links to the lecture and labs slides will always be provided shortly before the actual date.

Date	Week #	Sprint #	Description	Material (gradually added)
05/02/2024	W0	Sprint 0	Course introduction	lecture slides
12/02/2024	W1	Sprint 1	Use case definition	lecture slides
19/02/2024	W2	Sprint 1	Project organisation Lab git code versioning	lecture slides lab slides
26/02/2024	W3	Sprint 2	Data processing Lab YData-profiling	lecture slides lab slides - lab notebook
04/03/2024	W4	Sprint 2	Model development(class in B28 R.75 (0/75))	lecture slides lab folder
11/03/2024 - 18/03/2024	W4	Sprint 2	MS 1 presentations	See project section
11/03/2024	W5	Sprint 3	API implementation	lecture slides guest lecture lab folder
18/03/2024	W6	Sprint 3	Model serving & Cloud infrastructure	lecture slides guest lecture lab folder
25/03/2024	W7	Sprint 4	Serving & training optimisation	lecture slides lab folder
01/04/2024	No Class	No Class	No Class	No Class
08/04/2024	W8	Sprint 4	Model Pipeline	lecture slides lab folder
15/04/2024	W9	Sprint 5	Monitoring & dashboarding	lecture slides lab folder
22/04/2024	No Class	No Class	No Class	No Class
29/04/2024	No Class	No Class	No Class	No Class
06/05/2024	W10	Sprint 5	CICD	lecture slides lab folder
13/05/2024	W11	Closing sprint	MS 2 presentations	See project section
31/05/2024	Exam			See exam section

Course culture

Pillars

Relevant: Focused on core concepts of building ML applications. Tailored choice of current best practices.
Practical: Concrete Labs, resources, real life examples, time to experiment, support line, …
Engagin: Interactive class session. Healthy tempo (break out exercises, QA, …). … And lots of memes.

Learning Streams

There are 3 points of learnings for students following this course:

Lectures: Present theoretical concepts and practices. Usually the first part of each class.
Labs: Labs are there to learn how to use key tools widely used to build ML systems. They've been cherry picked based on usability, performance, popularity and accessibility.
Project: You will apply the theoretical and technical learnings in one large group project. You will build a ML application in an engaging way - you're in the driving seat.

Topics covered

Here is a list of all topics covered during this course (and links):

Week & material	Topics
Lecture W0 - Course introduction	Key concepts of ML Systems Design Roles & organisation of ML projects
Lecture W01 - Use case definition	Deep dive in real world use cases (Real estate valuation assistant, Rug cutting detection & Data driven sales) Project phases & challenges ML Canvas (project definition)
Lecture W02 - Project organization	Waterfall, Agile, Scrum & Kanban way of working Data sources types & identification (internal & external) Data labelling techniques Git flow & trunk based code versioning PEP8
Lab W02 - Git code versioning	Introduction to Git and basic commands GitFlow ohshitgit VSCode
Lecture W03 - Data preparation	Basic data format and models Databases (OLTP & OLAP) Exploratory Data Analysis (EDA) Data cleaning & feature engineering
Lab W03 - YData Profiling	Basic EDA using YDataProfiling
Lecture W04 - Model experimentation & containerisation	Model testing (offline & online) Virtual environments Virtual machines Docker Kubernetes
Lab W04 - Containerisation	Hands-on for Virtual environments Virtual machines Docker Kubernetes
Lecture W05 - API Implementation	API, REST & RPC
Guest lecture W05 - Connexion (Robbe Sneyders)	Connexion Concurrency, parallelism & async WSGI, ASGI & middleware Working open source
Lab W05 - Flask	Build a simple local Flask API
Lecture W06 - Model serving & Cloud infrastructure	Batch vs real-time serving Asynchronous vs synchronous serving Cloud vs on-prem deployment
Guest lecture W06 - Cloud Infrastructure (Philippe Modard)	Kaggle platform Cloud computing GCP services Vertex AI
Lab W06 - Cloud Run	Deploy an API in the Cloud using Cloud Run
Lecture W07 - Serving & training optimisation	Model serving optimisation Parallel and distributed training (data, model and pipeline) Triton Inference Server Model complexity optimisation
Lab W07 - Triton Inference Server	Deploying a local Triton Inference Server
Lecture W08 - Model pipelines	Microservice architecture ML Model pipeline Pipeline orchestrators
Lab W08 - Docker compose	Create an ML oriented microservice system with Docker Compose
Lecture W09 - Monitoring & dashboarding	Logging vs monitoring Resource vs performance monitoring Model performance monitoring Drift types: data, target and concept
Lab W09 - Google Cloud Storage	Download and upload files using Google Cloud storage
Lab W09 - Streamlit	Create an ML oriented dashboard using Streamlit and deploy it to Cloud Run
Lecture W10 - CICD	Continuous integration, Continuous development & Continuous deployment Code testing (unit, integration, system & acceptance) Pytest & pylint Multi-environment management (dev, tst, acc, prd) Infrastructure as code Terraform AI AI Legislation A guide to Trustworthy
Lab W10 - Gihub Actions	Use Gihub Actions for CICD

And here is an overview of how the sprints are organised:

Project

⚠️ EDIT: The format was updated for the first year to have only 2 milestone presentations in total. See the milestones section.

All the project info can be found in the project/project_description.pdf document.

Important to highlight that you are the decision makers for the design of your project 👐 Design it in a coherent way. Spend time on features that make sense. Don't see the workpackages as a long checklist, rather on possible points to implement.

For example, if you implement a computer vision model you might spend more time on the model serving rather than the data exploration. Or conversely, if you need to get tabular data from external APIs you might want to spend more time on the data construction rather than on model serving optimisation. Those are design choices you need to make depending on your project - and as always make sure to ask questions to the teaching staff if you have any (ideally at the end of lectures).

Milestones

The main way to handover the results of your projects will be during the 2 milestones.

The first Milestone will be used to present the work you did during sprint 1 & 2. The main purpose of the first Milestone is to provide feedback and guidance on the project. The final project grade will be determined at the end of the project. If there are any issues highlighted during the milestone there is room for fixing it in the later development of the project.

The second and final Milestone will be used to present the overall project once completed. It will be held during the last class of the semester, in the classroom, instead of a lecture. Teams are welcomed to attend other project presentations.

Make it your own! Focus on what is relevant and interesting. You are free to decide which material (if any) you will use to present your results (short slide deck, demo, show codes, ...).

The project counts for 70% of the final grade, while the exam counts for the remaining 30%.

Handovers

At each milestone, you will show your results through a 10 minutes presentation followed by a 5 minutes QA from the teaching staff.
Next to that, you will submit the codes you implemented so far by creating a pull requests in Github and asking the teaching staff to review it. Make sure to clearly document what is important in the code base for the teaching staff to look at. By default, the teaching staff will focus on the README and the pull request description. Send your codes before the milestone meeting and by an email with a link to the pull request to the teaching staff (t.vrancken@uliege.be and Matthias.Pirlet@uliege.be)

Make sure to send an email with a link to the pull request to the teaching staff when it is ready. Just tagging the teaching staff on Github is not enough.

Practicals Milestone 1

Milestone 1 presentations will happen between the 11/03/2024 and the 18/03/2024.

⚠️ Each team needs to book a meeting slot to present their Milestone. Link will be shared by email (if a group has no students in the mailing list please let the teaching staff know.). There are two options for those meetings:

You can book an online meeting via a Google Meets between 13:00 and 17:00 in the days available for each specific milestone (for exact days see calendar invite and timing section)
You can present it in the classroom from 11:30 to 12:30 right after the lectures in the two weeks following the closing sprint of a milestone (for exact days see calendar invite and timing section). As the class will be busy with presentations after the course those weeks, it means that other teams can still use the time to work on their projects but from another location. Support on the project will then be more ad-hoc those weeks. You can still raise questions by email, Discord or during the open office hours every Monday till 18:00 in office 77B in building B28.

Practicals Milestone 2

⚠️ Milestone 2 presentations will happen in the classroom during the last class on the 13/05/2024.

The schedule for the presentations can be found in this sheet. If your group is not in there or if the time does not work for your group please let the teaching staff know.

Attendace: Note that it is fine if one of your team member can really not attend the presentation. If it is not possible for a more people of your team to present that day you can also request to do it another day in an online meeting presentation.

Exam

The exam will take the form of an oral exam in whih you will have to answer a series of 4-5 questions based on a case study. You will have 30 minutes to prepare your answer and then 20 minutes to present them. The questions will be based on the course material and tools you used during the project. You can see what you need to know and what is not needed at the end of each lecture. You can also find a practice exam in the exams/practice_exam_june_2024.pdf document.

The PhD students are exempt from the exam. They will be graded on the project only.

You will receive an email with a Doodle link to select your preferred exam slot. It will be first come first serve. The exam will take place on 31/05/2024.

You can first go to the "preparation room" (B28 R.18 (0/18) [Liège Sart-Tilman - Polytech]), where you will be handed your exam page and some paper to take notes. You will have 30 minutes to prepare it, before going to the room where you will actually take the exame (office 77B in B28).

⚠️ You are not allowed to use your computer or smartphone once you received your exam assignment, nor can you use extra notes or material.

The exam counts for 30% of the final grade, while the project counts for the remaining 70%.

Resources

Similar classes

University of Bari.
- Paper: "Teaching MLOps in Higher Education through Project-Based Learning." arXiv preprint arXiv:2302.01048 (2023) (link)
- Lanubile Filippo, Silverio Martínez-Fernández and Luigi Quaranta
Stanford University
- CS 329S: Machine Learning Systems Design (link)
- Chip Huyen
Carnegie-Mellon University
- Machine Learning in Production / AI Engineering (link)
- Christian Kästner

General MLOps resources

Machine Learning Engineering for Production (MLOps) Specialization (Coursera, Andrew Ng) (link)
Made with ML (link)
Marvelous MLOps (link)
Hidden Technical Debt in Machine Learning System - Google (link)

Books

Building Machine Learning Powered Applications: Going from Idea to Product (Emmanuel Ameisen)
Introducing MLOps (Mark Treveil, Nicolas Omont, Clément Stenac et al.)
Machine Learning Design Patterns (Valliappa Lakshmanan, Sara Robinson, Michael Munn)

Credits

Many people helped and supported through the preparation of this course. Special thanks to:

Gilles Louppe
Filippo Lanubile
Luigi Quaranta
Robbe Sneyders
Philippe Modard
Sebastian Wehkamp
Ruwan Lambrichts
Jason Li
Thomas Janssens

ThomasVrancken/info9023-mlops