This repository contains a project realized as part of the Ethics in Artificial Intelligence course of the Master's degree in Artificial Intelligence, University of Bologna.
The aim of this project is to develop a proof of concept about how to address the gender discrimination in NLP. Two approaches have been investigated:
- Hard-Debiasing on pre-trained Italian Word Embeddings
- GN-GloVe which reduce the bias during the training of word embedidngs
In order to have a deeper understanding of the problem, take a look at the presentation of the project.
.
├── data # Contains the files of words used for the experiments
├── debiaswe # Contains debiasing functions
│ ├── co_occurrence.py # Functions to compute the co-occurence matrix for GN-Glove
│ ├── data.py # Functions to load data files
│ ├── debias_glove.py # Actual implementation of GN-Glove debiasing
│ ├── metrics.py # Functions to compute metrics for the experiments
│ └── we.py # Auxiliar functions to load and manage word embeddings
├── embeddings # Contains the word embeddings file for the hard-debiasing approach
├── scripts # Contains the scripts to convert the original twitter word embeddings to a tsv file and fileter
├── gn-glove_we_visualization.ipynb # Visualization of the word embeddings generated by GN-Glove
├── hard_debias_italian_we.ipynb # Visualization of the word embeddings generated by Hard-Debiasing
├── presentation.pdf # Slides about the project
├── LICENSE
└── README.md
The results of both approaches are presented below:
We use Git for versioning.
Name | Surname | Username | |
---|---|---|---|
Davide | Angelani | davide.angelani@studio.unibo.it |
qnozo |
Eric | Rossetto | eric.rossetto@studio.unibo.it |
Erhtric |
Giuseppe | Murro | giuseppe.murro@studio.unibo.it |
gmurro |
Salvatore | Pisciotta | salvatore.pisciotta@studio.unibo.it |
SalvoPisciotta |
Xiaowei | Wen | xiaowei.wen@studio.unibo.it |
WenXiaowei |
This project is licensed under the MIT License - see the LICENSE file for details