What is discriminatory machine learning
Discriminatory machine learning focuses on the challenges relating to the machine learning (through data) discriminatory aspects. It can take place through using variables that are considered discriminatory (e.g. sex, race) or correlated with them (e.g. neighbourhood).
This repository relates to a course in Aalto University focused on these challenges, and demonstrates two approaches to manage discremination.
What has been done?
I explore the relationship of household income to being an immigrant (that is, born in other country than currently living). I used European value survey data to condcuct my analysis, comparing two methods. The data can be obtained from Genis, free of charge for academic purposes.
What methods were chosen for comparison?
- Pope & Sydnor (2011) propose fixing the nature of discrimination using a linear regression model and using variable replace for these variables.
- Caldres & Verwer (2010) propose teaching different predictive models based on the discriminatory variables.
What were the variables used?
- Household income (fixed on country level, v353M_ppp)
- Immigrant status by the responder and his/her partner
- Combined level of education
- If the household has children
- The left-right leaning of the responder
How was discremination measured
We confirmed that there existed differences on household income based on the immigration status by comparing the household income in households with no immigrants and households of 1 or 2 immigrants.
What was observed
- On the raw data, there is clear statistical difference on household income, indicating discrimination
- Pope & Sydnor method was able to reduce the impact of the discrimination but not remove it.
- The Caldres & Verwer method was not able to reduce discrimination.