/gender-bias-BERT

This repository holds the code for my master thesis entitles "The Association of Gender Bias with BERT - Measuring, Mitigating and Cross-lingual portability"

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

Gender bias in BERT

This repository holds the code for my master thesis entitled "The Association of Gender Bias with BERT - Measuring, Mitigating and Cross-lingual portability", written at the University of Groningen and the University of Malta and supervised by Prof. Malvina Nissim and Prof. Albert Gatt. The thesis work was published under the title "Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias" as part of the 2nd Workshop on Gender Bias in Natural Language Processing at COLING 2020. ArXiv preprint: https://arxiv.org/abs/2010.14534

@inproceedings{bartl2020unmasking,
  title={Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias},
  author={Bartl, Marion and Nissim, Malvina and Gatt, Albert},
  editor={Costa-jussà, Marta R. and Hardmeier, Christian and Webster, Kellie and Radford, Will},
  booktitle={Proceedings of the Second Workshop on Gender Bias in Natural Language Processing},
  year={2020}
}

BEC-Pro

We created the Bias Evaluation Corpus with Professions (BEC-Pro). This corpus is designed to measure gender bias for different groups of professions and contains English and German sentences built from templates. The corpus files are BEC-Pro_EN.tsv for English and BEC-Pro_DE.tsv for German.

Code

In the code folder, run main.py, which requires the following arguments:

usage: main.py [-h] --lang LANG --eval EVAL [--tune TUNE] --out OUT [--model MODEL] [--batch BATCH]
               [--seed SEED]

  -h, --help     show this help message and exit
  --lang LANG    provide language, either EN or DE
  --eval EVAL    .tsv file with sentences for bias evaluation (BEC-Pro or transformed EEC)
  --tune TUNE    .tsv file with sentences for fine-tuning (GAP flipped)
  --out OUT      output directory and start of filename
  --model MODEL  which BERT model to use 
  --batch BATCH  fix batch-size for fine-tuning 
  --seed SEED

For English, run:

python3 main.py --lang EN --eval ../BEC-Pro/BEC-Pro_EN.tsv --tune ../data/gap_flipped.tsv --out ../data/results

For German, run:

python3 main.py --lang DE --eval ../BEC-Pro/BEC-Pro_DE.tsv --out ../data/results



Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0