TPM034A - Machine learning for socio-technical systems (Q2 - 2024)

1. Introduction.

Welcome to the git repository of TPM034A - Machine learning for socio-technical systems course. Here, you will find notebooks and sources for lab sessions and assignments related to the course. Please review the following information carefully.

2. Description

Machine Learning (ML) is increasingly seen as a crucial part of the puzzle to solving the socio-technical challenges of today’s networked, urbanised, knowledge societies. Successful adoption of ML does, however, not only require skilled computer scientists that do the hard-core programming. Also, professionals need both domain knowledge of socio-technical systems and a profound understanding of ML.

This course aims to provide students in the socio-technical domain with a profound understanding of ML. It prepares students for the challenges and questions ML will pose to them in their later careers. To this end, the course consists of three parts:

Part 1: Fundamentals of ML
Part 2: Explainability of ML
Part 3: Applications of ML for socio-technical challenges

In part 1, students learn about ML fundamentals and methods. These weeks a critical technical foundation is laid for grasping the strength and weaknesses of ML for the analysis and design of socio-technical systems.

Part 2 is devoted to the explainability of ML. For public decision-making as well as for decision-making in high-stake contexts, such as e.g. autonomous vehicles, legal systems’ transparency and explainability of the models are of critical importance. Students learn several popular explainability techniques and discuss their value for applications in socio-technical systems.

In Part 3, a rotating group of TPM scholars provides exemplar applications of ML in socio-tech systems. This serves two purposes: (1) to show where and how ML is applied for analysis and application in socio-technical systems, and (2) to deepen reflection on the impact of ML-based solutions and interventions on individuals, organisations, and society. Consecutively, students work in a group on a final project, building forth on one of the presented applications. This project brings together the three parts of this course. Students need to apply ML models and techniques to real-world data in a notebook, interpret, and communicate their results, taking into account the socio-technical setting through a presentation.

The course consists of oral lectures, lab sessions, and assignments. The aim of the lab sessions is to show and reinforce how the ML models, explainable ML techniques and ML ideas presented in the oral lectures are put to practice. Also, they help students gather hands-on machine learning skills. The lab sessions involve a series of exercises in the form of Jupyter notebooks.

The course consists of 2 oral lectures and 1 lab session per week. Attendance of the lectures and lab sessions is highly recommended to keep up with the course, but not mandatory.

1.1. Prerequisite knowledge

We expect students have taken undergraduate courses on the following topics before coming to this course. – Statistics & Data analyses (e.g. hypothesis testing, correlation, etc.) – Python programming (basic level)

2. Calendars

Please see below the schedule of the lectures and lab sessions, and the calendar of publication and deadlines.

2.1. Course schedule

Week	Lecture	Weekday	Date	Time	Teacher	Lecture type	Topic	Materials
46	0.0				Sander	Lab session	L0: Basics of python and data analysis	Lab session 0
46	1.1	Monday	11-11-2024	13:45 - 15:45	Sander	Opening	LOs, Set-up, Scope, Planning, Expectations; What is ML?
46	1.2	Tuesday	12-11-2024	10:45 - 12:45	Sander	Oral lecture	Learning, Generalisation, Model dev, Regression, Geospatial data
46	1.3	Friday	15-11-2024	10:45 - 12:45	Sander	Lab session	L1: Discover, explore and visualise data	Lab session 1
47	1.4	Monday	18-11-2024	13:45 - 15:45	Sander	Oral lecture	Decision trees; Model performance metrics
47	1.5	Tuesday	19-11-2024	10:45 - 12:45	Sander	Oral lecture	ANNs; training; tuning
47	1.6	Friday	22-11-2024	10:45 - 12:45	Sander	Lab session	L2: Artificial Neural Networks	Lab session 2
48	1.7	Monday	25-11-2024	13:45 - 15:45	Sander	Oral lecture	Ensembles; Random forest; XGBoost
48	1.8	Tuesday	26-11-2024	10:45 - 12:45	Sander	Oral lecture	Embeddings, Causality; In vs out-of-distribution generalisation
48	1.9	Friday	29-11-2024	10:45 - 12:45	Sander	Lab session	L3: Working with image embeddings	Lab session 3
49	2.1	Monday	02-12-2024	13:45 - 15:45	Giacomo	Oral lecture	The risks of AI, Explainable AI, Transparency, Explanation by simplification
49	2.2	Tuesday	03-12-2024	10:45 - 12:45	Giacomo	Oral lecture	Global model-agnostic methods (Partial Dependence Plots, Individual Conditional Expectation Plots), LIME, SHAP
49	2.3	Thursday	05-12-2024	08:45 - 10:45	Giacomo	Lab session	L4: Explainable AI and energy prediction
50	2.4	Monday	09-12-2024	13:45 - 15:45	Giacomo	Oral lecture	Other post-hoc explainability tools (GSA, PFI), Responsible AI, XAI and climate change, Recap
50	2.5	Tuesday	10-12-2024	10:45 - 12:45	Giacomo	Lab session	L5: Explainable AI and appliance usage prediction
50	3.1	Thursday	12-12-2024	10:45 - 12:45	Amir	Oral lecture	Pitches by TPM researchers for the mini-projects
51	3.2	Monday	16-12-2024	13:45 - 15:45	Amir	Group work
51	3.3	Tuesday	17-12-2024	10:45 - 12:45	Amir	Group work
51	3.4	Thursday	19-12-2024	10:45 - 12:45	Amir	Group work
02	3.5	Monday	06-01-2025	13:45 - 15:45	Amir	Group work
02	3.6	Tuesday	07-01-2025	10:45 - 12:45	Amir	Group work
02	3.7	Thursday	09-01-2025	10:45 - 12:45	Amir	Presentations

2.2. Publication dates and deadlines

The assignments have to be submitted directly to Brightspace.

Week	Weekday	Date	Time	Event	Materials
46	Friday	08-11-2024	23:59	Lab 0 publication	Lab session 0
46	Monday	11-11-2024	16:00	Lab 1 publication	Lab session 1
46	Wednesday	13-11-2024	09:00	Assignment 1 publication	Assignment 1
47	Monday	18-11-2024	16:00	Lab 2 publication	Lab session 2
48	Monday	25-11-2024	09:00	Deadline Assignment 1	PASSED
48	Monday	25-11-2024	16:00	Lab 3 publication	Lab session 3
48	Wednesday	27-11-2024	09:00	Assignment 2 publication	Assignment 2
49	Wednesday	04-12-2024	18:00	Lab 4 publication	Lab session 4
50	Monday	09-12-2024	16:00	Lab 5 publication	Lab session 5
50	Wednesday	12-12-2024	09:00	Deadline Assignment 2

3. Q&A Forum

We use the Issues section as the Q&A platform of this course. Here, you can post your questions related to the content of the lectures, the lab sessions, assignments and technical problems with Python. Before you create a new issue, please make sure the issue has not been raised before by one of your classmates. Besides asking questions, you can comment on the earlier issues e.g. to continue the discussion. As an example, we have already created the first issue; see Issues.

To create a new issue (question, discussion or problem) in the course repository, follow these steps:

Go to the "Issues" section of the course repository.
Click on "New issue" in the green button located at the upper right corner of your screen.
Add an informative title to your question or problem (e.g., "Train method from sklearn does not import in my notebook").
Describe your issue clearly and concisely.
Click on "Submit new issue" in the green button below the text description.

After that, the lecturer or teaching assistant will reply to your question. Also, you are allowed (even encouraged) to reply to questions posted by your fellow students! If you know how to help your fellow student with an issue, share your thoughts!

4. Software Requirements (Lab Sessions and Assignments)

In this course, lab sessions and assignments will require programming in Python. You’ll need to prepare your computer by installing and setting up essential software in order to complete this course. Specifically, you will need a Python interpreter and a code editor. The interpreter allows your computer to run Python code, and it can be installed directly from Python’s website or through third-party software like Conda or Anaconda. The code editor is where you’ll write and execute Python scripts and notebooks; examples include VS Code, PyCharm or JupyterLab.

We offer instructions for two different setup options: (1) Anaconda, and (2) Google Colab. While you can do the activities with both, we strongly recommend Anaconda.

Anaconda: This setup involves installing Anaconda (if it isn't already installed) and configuring the required Python packages for the course. This method is commonly used in other TU Delft courses, so it may be familiar to you.
Google Colab: This option requires no installation on your computer, as you’ll work in an online environment. This makes setup easy, though it requires a stable internet connection and can sometimes be slower than the other options.

NOTES:

If you are unfamiliar with Python, we recommend completing lab session 0 after you finish the workspace setup. This lab provides the necessary tools to conduct the lab sessions. It covers topics such as data structures, utilising external libraries, data exploration, visualisation, etc.

For those with advanced experience in managing Python versions and environments, we also provide the requirements.txt file for you to create the respective environment. If you choose this method, ensure you create a virtual environment with Python version 3.11 for compatibility. This option is intended for students comfortable with Python’s native package manager, PIP, and manual environment setup.

4.1. SETUP 1: Anaconda (strongly recommended)

SETUP 1.1: Installing Anaconda

Download Anaconda for your system:
Run the installer and follow the instructions.
Once you have anaconda installed, we need to set up an independent virtual environment that isolates all the functionality we need in this course.

Python environments function like isolated sandboxes, each with its own versions of Python and packages. You can create, export, list, update, and remove environments as needed. Moving between environments, known as “activating” an environment, allows you to work with different setups for specific projects. When you’re finished with an environment, you can simply “deactivate” it to return to your default settings.

For this course, we have prepared the Python evironment as recipe. With this recipe, Anaconda can create the same coding environment for all of you. The recipe can be found in this repo (tpm034a_env.yaml).

SETUP 1.2: Creating an environment from an `env.yaml` file

Download and unzip the tpm034a_env file to your computer (download link)
Open Anaconda Navigator:
- (1) Go to “Environments” in the left sidebar.
- (2) Click on “Import”
- (3) From your local drive, import the file you just downloaded (*.yaml)
- (4) Give the environment a name (e.g., tmp034a)
- (5) Keep the option “Overwrite exisiting environment” UNchecked.
- (6) Click on import (Depending on the speed of your connection, this step will take a while (but no less than 15-30 minutes))
Once you have the environment ready. We have to choose an interpreter for activating the environment.

SETUP 1.3: Using the new environment

Open Anaconda Navigator:
- (1) Go to “Home” in the left sidebar.
- (2) In the scroll menu, select your newly created environment (e.g., tmp034a).
- (3) Choose the interpreter of your preferences. Click on "install", and then "lunch". The course team suggest VS Code.
If everything was installed properly, the interpreter of your preferences in now opening with the env we just created.

SETUP 3: Google Colab

For this option, you need a Google account, Google Chrome web browser, and a stable internet connection. Please follow the steps we've included below to set up the workspace.

Step 1: Download or clone this repo to your computer.
- For downloading: on the top of this site, click on (1) <>Code tab, then in the green button (2) Code and then (3) Download ZIP (See numbers 1, 2, and 3 on the following image). Unzip this file into a working folder of your own choice.
- For cloning: Open a console (terminal on macOS or CMD on Windows), locate the console where you want to get this repo, and execute git clone .... The entire repo will be downloaded to your computer. (IMPORTANT:You need to have installed GIT for cloning)
Step 2: Go to http://colab.research.google.com
Step 3: Sign in with your Google account (if you are already signed in, skip this step). If you do not have a Google account, you must (temporarily) create one.
Step 4: Upload the Python notebook you want to work on Colab. Click on the "Upload" tab and then on the "Choose file" tab, see numbers 1 and 2 in the figure below. Then, navigate to your working folder (Step 1) and select the Python notebook (.ipynb) you want to work on (e.g. lab_session_00.ipynb).
Step 5: Once open, click on "View" >> "Expand sections" on the menu bar.
Step 6: Importantly, Each notebook has a cell to prepare the data and environment in Google Colab. Uncomment (i.e. remove the '#') the lines related to the Colab set-up in your notebook, see the figure below. Run this cell and wait until finished.
Step 7: You are all set! You can work on your notebook.

Finally, note that the requirements files may be updated during the course to include more dependencies if needed.

TPM034A/Q2_2024