faircause
The R-package faircause
can be used for performing Causal Fairness
Analysis and implements the methods described in the paper Causal
Fairness Analysis (Plecko & Bareinboim,
2022). We refer you to the manuscript
for full theoretical details and the methodology. Below we offer quick
installation instructions and show a worked example that can help the
user get started.
Installation
You can install faircause
from this Github repository by using the
devtools
package:
devtools::install_github("dplecko/CFA")
Please note that faircause
is currently at its first version
0.0.0.9000
, meaning that is has not yet been thoroughly tested. Any
issues and bug reports are warmly welcomed and much appreciated.
Example
We show an example of how to use the faircause
package on the US
Government Census 2018 dataset collected by American Community
Survey.
The dataset contains information on 204,309 employees of the US
government, including demographic information
(age, race, location, citizenship), education and work related
information
,
and the yearly earnings
.
The protected attribute
we consider in this case is sex
(
male,
female).
A data scientist analyzing the Census dataset observes the following:
library(faircause)
census <- head(faircause::gov_census, n = 20000L)
TV <- mean(census$salary[census$sex == "male"]) -
mean(census$salary[census$sex == "female"])
TV
#> [1] 15053.69
In the first step the data scientist computed that the average disparity in the yearly salary measured by the TV is
The data scientist has read the Causal Fairness Analysis paper and now wants to understand how this observed disparity relates to the underlying causal mechanisms that generated it. To this end, he constructs the Standard Fairness Model (see Plecko & Bareinboim, Definition 4) associated with this dataset:
X <- "sex" # protected attribute
Z <- c("age", "race", "hispanic_origin", "citizenship", "nativity",
"economic_region") # confounders
W <- c("marital", "family_size", "children", "education_level", "english_level",
"hours_worked", "weeks_worked", "occupation", "industry") # mediators
Y <- "salary" # outcome
Based on this causal structure of the variables, the data scientist now
performs Causal Fairness Analysis by using the fairness_cookbook()
function exported from the faircause
package:
# decompose the total variation measure
set.seed(2022)
tvd <- fairness_cookbook(data = census, X = X, W = W, Z = Z, Y = Y,
x0 = "female", x1 = "male")
# visualize the x-specific measures of direct, indirect, and spurious effect
autoplot(tvd, decompose = "xspec", dataset = "Census 2018")
The data scientist concludes that there is a substantial cancellation of the direct, indirect effects, namely:
- the direct effect explains $10,300 of the observed disparity (that is, females would be paid more, had they been male in this case)
- the indirect effect accounts for -$6,400 (cancelling out with the direct effect)
- the spurious effect accounts for $1,000 of the observed variation
In particular, the dataset might show evidence of disparate treatment, which needs further investigation.