FairShades: Fairness Auditing via Explainability in Abusive Language Detection Systems

FairShades is a model-agnostic approach for auditing the outcomes of abusive language detection systems. Combining explainability and fairness evaluation, it can identify unintended biases and sensitive categories towards which the models are most discriminative. This objective is pursued through the auditing of meaningful counterfactuals generated through the CheckList framework (Ribeiro et al., 2020). We conduct several experiments on BERT-based models to demonstrate our proposal's novelty and effectiveness for unmasking biases.

This project is the continuation of a master's thesis project supervised and developed with Riccardo Guidotti.

Marta Marchiori Manerba and Riccardo Guidotti. "FairShades: Fairness Auditing via Explainability in Abusive Language Detection Systems". 2021 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2021.

Bibtex for citations:

@inproceedings{manerba2021fairshades,
  title={FairShades: Fairness Auditing via Explainability in Abusive Language Detection Systems},
  author={Manerba, Marta Marchiori and Guidotti, Riccardo},
  booktitle={2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI)},
  pages={34--43},
  year={2021},
  organization={IEEE}
}

MartaMarchiori/FairShades

FairShades: Fairness Auditing via Explainability in Abusive Language Detection Systems