Fairness, Ethics, Explainability in AI and ML

This repository provides resources, tools and notebooks for Fairness, Ethics, Explainability in AI and ML.

Resources
Tools
Notebooks

Resources

Fairness and machine learning

By Solon Barocas, Moritz Hardt, Arvind Narayanan

Online text book [WIP]

Introduction
Classification
Legal background and normative questions
Causality
Testing discrimination in practice
A broader view of discrimination
Measurement
Algorithmic interventions

Appendix: Technical background

Video tutorials

Courses

Blog posts

Distill

The Building Blocks of Interpretability – distill.pub (2018)

"Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them — and the rich structure of this combinatorial space."

OpenAI Microscope

OpenAI introduced Microscope – a collection of visualizations of layers and neurons of several common deep learning models that are often studied in interpretability. Microscope makes it easier to analyze the features that form inside these neural networks.

Source: OpenAI

Tools

Lime

Lime is about explaining what machine learning models are doing. It supports explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations). Lime is based on the work presented in the paper "Why Should I Trust You?": Explaining the Predictions of Any Classifier" (2016) by Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin.

Source: Lime, GitHub

Captum

Model Interpretability for PyTorch. [Tutorials]

Supports interpretability of models across modalities including vision, text, and more
Supports most types of PyTorch models and can be used with minimal modification to the original neural network
Open source, generic library for interpretability research. Easily implement and benchmark new algorithms

Source: Captum

AllenNLP Interpret

A Framework for Explaining Predictions of NLP Models by Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, and Sameer Singh

AllenNLP Interpret is a toolkit built on top of AllenNLP for interactive model interpretations. The toolkit makes it easy to apply gradient-based saliency maps and adversarial attacks to new models, as well as develop new interpretation methods. It contains three components: a suite of interpretation techniques applicable to most models, APIs for developing new interpretation methods (e.g., APIs to obtain input gradients), and reusable front-end components for visualizing the interpretation results.

Source: AllenNLP Interpret

ELI5

ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.

Source: ELI5 Documentation
Tutorials: Notebook

ML Fairness Gym

Google released the ML Fairness Gym (2020), a set of components for building simple simulations that explore potential long-run impacts of deploying machine learning-based decision systems in social environments.

"Fairness is not Static: Deeper Understanding of Long Term Fairness via Simulation Studies" [Paper]
[ML Fairness GitHub repository]

Machine-Learning-Tokyo/ML_Fairness_Ethics_Explainability