This is the repository corresponding to Examining Gender and Power on Wikipedia Through Face and Politeness, which was accepted to SIGDIAL 2024. It contains the code and data used to train face act annotation models for Wikipedia talk pages. This includes the WikiFace Corpus, a subset of the Wikipedia Talk Pages Corpus which we annotated for face acts.
Supposing conda and poetry are installed, the project dependencies can be setup using the following commands.
conda create -n wikiface python=3.10
conda activate wikiface
poetry install
By default, all scripts will log their output to /home/{username}/scratch/logs/
. To change this behavior see ~line 40 of src/core/
A summary of the content and structure of the repository is shown below.
|- bin/
| |- - trains face act classification models.
|- configs/
| |- llama3.json - configuration for training our reported model.
| |- predict.json - configuration for predicting with our reported model.
|- data/
| |- wikiface/ - the wikiface corpus and all unannotated talk pages.
|- outputs/
| |- ... - default location (generated) for models and results.
|- src/
| |- ... - additional utilities.
CUDA_VISIBLE_DEVICES=0 ./bin/ configs/llama3.json
The checkpoint and predictions which were reported on in our paper can be found on this Google Drive. This is the output generated by the following command which trains Llama-3-8B on all of WikiFace and uses the resulting model to predict face acts for the unnannotated Wikipedia Talk Pages Corpus.
CUDA_VISIBLE_DEVICES=0 ./bin/ configs/predict.json
title = "Examining Gender and Power on Wikipedia Through Face and Politeness",
author = "Soubki, Adil and Choi, Shyne and Rambow, Owen",
booktitle = "25th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2024)",
year = "2024",
publisher="Association for Computational Linguistics"