What is discriminatory machine learning

Discriminatory machine learning focuses on the challenges relating to the machine learning (through data) discriminatory aspects. It can take place through using variables that are considered discriminatory (e.g. sex, race) or correlated with them (e.g. neighbourhood).

This repository relates to a course in Aalto University focused on these challenges, and demonstrates two approaches to manage discremination.

What has been done?

I explore the relationship of household income to being an immigrant (that is, born in other country than currently living). I used European value survey data to condcuct my analysis, comparing two methods. The data can be obtained from Genis, free of charge for academic purposes.

What methods were chosen for comparison?

Pope & Sydnor (2011) propose fixing the nature of discrimination using a linear regression model and using variable replace for these variables.
Caldres & Verwer (2010) propose teaching different predictive models based on the discriminatory variables.

What were the variables used?

Household income (fixed on country level, v353M_ppp)
Immigrant status by the responder and his/her partner
Combined level of education
If the household has children
The left-right leaning of the responder

How was discremination measured

We confirmed that there existed differences on household income based on the immigration status by comparing the household income in households with no immigrants and households of 1 or 2 immigrants.

What was observed