mattbit/awesome-ai-safety

A curated list of papers & technical articles on AI Quality & Safety 📚

Apache-2.0

Awesome AI Safety

Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?

This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚

Table of Contents

You can browse papers by Machine Learning task category, and use hashtags like #robustness to explore AI risk types.

Tabular Machine Learning
Natural Language Processing
Computer Vision
Recommendation System
Time Series
General ML Testing

Tabular Machine Learning

Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021) #DataSlice #Debugging #Drift
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020) #DataSlice
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016) #Explainability

Natural Language Processing

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) #Robustness
Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022) #Bias #Ethics
Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016) #Explainability
A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017) #Explainability
Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018) #Explanability
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021) #Debugging
SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022) #DataSlice #Explainability

Large Language Models

Holistic Evaluation of Language Models (Liang et al., 2022) #General
Learning to summarize from human feedback (Stiennon et al., 2020) #HumanFeedback

Computer Vision

DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022) #DataSlice
Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022) #Robustness
Model Assertions for Debugging Machine Learning (Kang et al., 2018) #Debugging

Recommendation System

Contributions are welcome 💕

Time Series

Contributions are welcome 💕

General ML Testing

Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020) #General
Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021) #General
Metamorphic testing of decision support systems: A case study (Kuo et al., 2010) #Robustness
A Survey on Metamorphic Testing (Segura et al., 2016) #Robustness
Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011) #Robustness
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017) #General
The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022) #Explanability
InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019) #Explainability #General