02 - Regression Analysis, Observational Studies, Read the Stats, and Supervised Learning

In this homework, you will work with two popular frameworks: (1) sklearn (short for scikit-learn) and (2) statsmodels. In addition, you will create some basic visualizations to explain your findings. Regression analysis and Supervised learning constitute the two quintessential skills for a data scientist, thereby serving as the perfect material to prepare you for the real world.

The homework consists of three tasks, which are described in the hw2.ipynb notebook.

For each task, please provide both a written explanation of the steps you followed, and the corresponding code. Keep in mind that writing the explanation can help you in two ways:

  1. Clarifying the steps in your mind before writing the actual code
  2. Earning you points if the description is correct, regardless of the potential issues in your code

Submission Guidelines

You are expected to solve the homework as a team of three, which you specified in the project registration form. By the homework submission deadline, each team should have a single shared private GitHub repo under the epfl-ada organization, containing the Jupyter Notebook with the solution. Please follow the instructions below to create your team repo and start working on the homework:

  1. One team member should follow this link and create a team by adding a prefix hw2_ to the exact team name as you specified for Homework 1.
  2. Creation of the team will automatically create a dedicated private repo. At this point the remaining two team members should follow the same link and join their team. Make sure you are joining the correct team by checking your team-members' GitHub accounts: there might be teams with similar or same names.
  3. There is no simple automated way to transfer the materials for Homework 2 from the public course repository into your private team repository. To get started, we suggest that you manually pull the homework materials from the course repository to your local machine, copy them into your local team repository, and push to the remote.
  4. Afterwards -- keep collaborating on the homework as a team in your shared private repository!
  5. Important: you may organize your work on the homework per your convenience (multiple branches, etc.), but by the submission deadline you should have ''strictly'' one single branch in your repo. Push the solved version of the homework in this branch by November 20th, 23:59!