/debiasing-gender-nlp

This is the repo for the project in Ethics for AI at @unibo. We tackle the problem of gender discrimination present in various NLP tasks by exploiting notions and methods presented in the literature.

Primary LanguageJupyter NotebookMIT LicenseMIT

♂️ Gender discrimination in Natural Language Processing ♀️

This repository contains a project realized as part of the Ethics in Artificial Intelligence course of the Master's degree in Artificial Intelligence, University of Bologna.

Description

The aim of this project is to develop a proof of concept about how to address the gender discrimination in NLP. Two approaches have been investigated:

  • Hard-Debiasing on pre-trained Italian Word Embeddings
  • GN-GloVe which reduce the bias during the training of word embedidngs

In order to have a deeper understanding of the problem, take a look at the presentation of the project.

Repository structure

.
├── data                             # Contains the files of words used for the experiments
├── debiaswe                         # Contains debiasing functions 
│   ├── co_occurrence.py             # Functions to compute the co-occurence matrix for GN-Glove
│   ├── data.py                      # Functions to load data files
│   ├── debias_glove.py              # Actual implementation of GN-Glove debiasing
│   ├── metrics.py                   # Functions to compute metrics for the experiments 
│   └── we.py                        # Auxiliar functions to load and manage word embeddings
├── embeddings                       # Contains the word embeddings file for the hard-debiasing approach
├── scripts                          # Contains the scripts to convert the original twitter word embeddings to a tsv file and fileter 
├── gn-glove_we_visualization.ipynb  # Visualization of the word embeddings generated by GN-Glove
├── hard_debias_italian_we.ipynb     # Visualization of the word embeddings generated by Hard-Debiasing                        
├── presentation.pdf                 # Slides about the project
├── LICENSE
└── README.md

Results

The results of both approaches are presented below:

  • Hard-Debiasing:

  • GN-GloVe:

Versioning

We use Git for versioning.

Group members

Name Surname Email Username
Davide Angelani davide.angelani@studio.unibo.it qnozo
Eric Rossetto eric.rossetto@studio.unibo.it Erhtric
Giuseppe Murro giuseppe.murro@studio.unibo.it gmurro
Salvatore Pisciotta salvatore.pisciotta@studio.unibo.it SalvoPisciotta
Xiaowei Wen xiaowei.wen@studio.unibo.it WenXiaowei

License

This project is licensed under the MIT License - see the LICENSE file for details