Face and Politeness on Wikipedia

This is the repository corresponding to Examining Gender and Power on Wikipedia Through Face and Politeness, which was accepted to SIGDIAL 2024. It contains the code and data used to train face act annotation models for Wikipedia talk pages. This includes the WikiFace Corpus, a subset of the Wikipedia Talk Pages Corpus which we annotated for face acts.

Installation

Supposing conda and poetry are installed, the project dependencies can be setup using the following commands.

conda create -n wikiface python=3.10
conda activate wikiface
poetry install

By default, all scripts will log their output to /home/{username}/scratch/logs/. To change this behavior see ~line 40 of src/core/context.py.

Content

A summary of the content and structure of the repository is shown below.

wikiface/
|- bin/
|  |- classification.py - trains face act classification models.
|- configs/
|  |- llama3.json       - configuration for training our reported model.
|  |- predict.json      - configuration for predicting with our reported model.
|- data/
|  |- wikiface/         - the wikiface corpus and all unannotated talk pages.
|- outputs/
|  |- ...               - default location (generated) for models and results.
|- src/
|  |- ...               - additional utilities.

Example Usage

CUDA_VISIBLE_DEVICES=0 ./bin/classification.py configs/llama3.json

Pre-Trained Checkpoint

The checkpoint and predictions which were reported on in our paper can be found on this Google Drive. This is the output generated by the following command which trains Llama-3-8B on all of WikiFace and uses the resulting model to predict face acts for the unnannotated Wikipedia Talk Pages Corpus.

CUDA_VISIBLE_DEVICES=0 ./bin/classification.py configs/predict.json

Citation