Spectral Removal of Guarded Attribute Information

This repository contains the code for the experiments and algorithm from the paper Spectral Removal of Guarded Attribute Information (appears at EACL 2023).

Introduction

We propose to erase information from neural representations by truncating a singular value decomposition of a covariance matrix between the neural representations and the examples representing the information to be removed or protected attributes. The truncation is done by taking the small singular value principal directions (indicating directions that covary less with the protected attribute).

In addition, we also describe a kernel method to solve the same problem. Rather than performing SVD on the covariance matrix, the kernel method performs a series of spectral operations on the kernel matrices of the input neural representations and the protected attributes.

Experimental Setting and Datasets

We use the experimental settings from the paper "Null it out: guarding protected attributes by iterative nullspsace projection", as use the algorithm in that paper as a benchmark.

Algorithm

The implementation is available for Python ksal.py and Matlab.

Given an example representation of X in the shape of (number of samples, number of dimensions) with a label of biases Z and an optional label for main purpose Y, ksal.py designed to remove the information of Z and we found it is good at keeping the information about Y. We evaluate the biases before and after debiasing by using different classifiers on the pair of (X, Z), tpr-gap between different populations (p(Y=Y'|X,Z)) and some other popular metrics like WEAT.

Experiments

Start a new virtual environment:

conda create -n SAL python=3.7 anaconda
conda activate SAL

Install jsonnet from conda-forge and other dependencies from requirement.txt

conda install -c conda-forge jsonnet
pip install -r requirements.txt

Setup

Use the following script to download the datasets used in this repository:

./download_data.sh

Download EN library from spaCy

python -m spacy download en

Word Embedding Experiments (Section 6.1 in the paper)

python src/data/to_word2vec_format.py data/embeddings/glove.42B.300d.txt

python src/data/filter_vecs.py \
--input-path data/embeddings/glove.42B.300d.txt \
--output-dir data/embeddings/ \
--top-k 150000  \
--keep-inherently-gendered  \
--keep-names

And run the notebook notebook

To run the Word similarity Experiments (table 1)

Please check the notebook notebook for our method, and notebook for INLP

Controlled Demographic experiments (Section 6.2.1 in the paper)

export PYTHONPATH=/path_to/nullspace_projection

./run_deepmoji_debiasing.sh

In order to recreate the evaluation used in the paper, check out the following sal notebook

Bias Bios experiments (Section 6.2.2 in the paper)

Assumes the bias-in-bios dataset from De-Arteaga, Maria, et al. 2019 saved at data/biasbios/BIOS.pkl.

python src/data/create_dataset_biasbios.py \
        --input-path data/biasbios/BIOS.pkl \
        --output-dir data/biasbios/ \
        --vocab-size 250000

./run_bias_bios.sh

Run the BERT experiments in BERT sal notebook

Run the FastText experiments in FastText sal notebook

jasonshaoshun/SAL