/cellophane

Algorithmic bias auditing tool for use when the protected class is unobserved in data

Primary LanguageHTMLOtherNOASSERTION

Overview

A practical challenge for assessing disparity along protected class lines in algorithmic systems is that protected class membership is often not observed in the data.

To address this challenge, various methods have been developed to impute the protected class using proxies found in the original dataset. The most (in)famous of these is BISG, which uses surname and geolocation to predict race.

These methods are controversial for various socio-technical and statistical reasons. The main statistical challenge is that the estimation of protected class membership from proxies is uncertain and subject to ad-hoc modelling choices.

Bias metrics calculated using proxy-based estimations without taking into account uncertainty will be fundamentally spurious point estimates. Conclusions reached using these point estimates will be vulnerable to challenge.

This auditing package implements algorithms described here to provide meaningful estimates of disparity, taking into account the uncertainty of estimation. Instead of generating point estimates, it generates a range of all possible disparities, known as a partial identification set. A tight set will allow for robust conclusions even though the protected class membership wasn't observed in the primary set. A wide set generally means that the proxies aren't informative enough to draw conclusions.

Visualization

In addition to calculating partial identification sets, this package contains plotting functions to easily visualize partial identification sets:

image info

Installation

To install cellophane, clone, unpack it, and:

$ python setup.py install

A pip package is coming shortly.

Demo

A short demo can be viewed here

Docs

Documentation can be viewed here

Understanding the underlying algorithms is important. Read about them at Kallus, Mao and Zhou (2020).