/Quantifying-Annotation-Disagreement

Official implementation of Wan et al's paper "Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information" (AAAI 2023)

Primary LanguageJupyter Notebook

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

This repository provides datasets and code for preprocessing, training and testing models for quantifying annotation disagreement with the official Hugging Face implementation of the following paper:

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
Ruyuan Wan, Jaehyung Kim, Dongyeop Kang
AAAI 2023

Our code is mainly based on HuggingFace's transformers libarary.

Installation

The following command installs all necessary packages:

pip install -r requirements.txt

The project was tested using Python 3.7.

HuggingFace Integration

We uploaded both our datasets and model checkpoints to Hugging Face's repo. You can directly load our data using datasets and load our model using transformers.

# load our dataset
from datasets import load_dataset
dataset = load_dataset("RuyuanWan/SBIC_Disagreement")
# you can replace "SBIC_Disagreement" to "SChem_Disagreement", "Dilemmas_Disagreement", "Dynasent_Disagreement" or "Politeness_Disagreement" to change datasets

# load our model
from simpletransformers.classification import ClassificationModel, ClassificationArgs
model_args = ClassificationArgs()
model_args.regression = True
SBIC_person_demo_col_regression = ClassificationModel(
    "roberta",
    "RuyuanWan/SBIC_RoBERTa_Demographic-text_Disagreement_Predictor",
    num_labels=1,
    args=model_args
)
# you can replace "SBIC_RoBERTa_Demographic-text_Disagreement_Predictor" to other pretrained models

#predict
# you can replace example text to other random examples. 
text_example1 = ['Abortion should be legal']
predict1, raw_outputs1 = SBIC_person_demo_col_regression.predict(text_example1)
print(predict1)

Open In Collab
We also provided a simple demo code for how to use them to predict disagreement.

Datasets

We used public datasets of subjective tasks that contain annotators’ voting records from their original raw dataset

You can load our processed version of disagreement datasets using Hugging Face's datasets, and you can also download the disagreement datasets in datasets/

Here are the five datasets with disagreement labels. You can change the following data specifications in using Hugging Face's datasets:

Dataset name in Hugging Face Dataset information
"RuyuanWan/SBIC_Disagreement" SBIC dataset with disagreement labels
"RuyuanWan/SChem_Disagreement" SChem dataset with disagreement labels
"RuyuanWan/Dilemmas_Disagreement" Dilemmas dataset with disagreement labels
"RuyuanWan/Dynasent_Disagreement" Dynasent dataset with disagreement labels
"RuyuanWan/Politeness_Disagreement" Politeness dataset with disagreement labels

Models

In our disagreement prediction experiments, we compared:

  • Binary v.s. continous disagreement labels,
  • Only text input v.s. text with annotator's demographic information,
  • Text with group-wise annotator's demographic information v.s. text with personal level annotator's demographic information.

plot

Here are the different models that we stored at Hugging Face.

Model name in Hugging Face Model information
"RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Binary_Classifie" Binary diagreement classifier trained on SBIC text
"RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor" Disagreement predictor trained on SBIC text(regression)
"RuyuanWan/SBIC_RoBERTa_Demographic-text_Disagreement_Predictor" Disagreement predictor trained on SBIC text and individual annotator's demographic information in colon templated format
"RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier" Binary diagreement classifier trained on SChem text
"RuyuanWan/SChem_RoBERTa_Text_Disagreement_Predictor" Disagreement predictor trained on SChem text(regression)
"RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor" Disagreement predictor trained on Schem text and individual annotator's demographic information in colon templated format
"RuyuanWan/Dilemmas_RoBERTa_Text_Disagreement_Binary_Classifier" Binary diagreement classifier trained on Dilemmas text
"RuyuanWan/Dilemmas_RoBERTa_Text_Disagreement_Predictor" Disagreement predictor trained on Dilemmas text(regression)
"RuyuanWan/Dynasent_RoBERTa_Text_Disagreement_Binary_Classifier" Binary diagreement classifier trained on Dilemmas text
"RuyuanWan/Dynasent_RoBERTa_Text_Disagreement_Predictor" Disagreement predictor trained on Dynasent text(regression)
"RuyuanWan/Politeness_RoBERTa_Text_Disagreement_Binary_Classifier" Binary diagreement classifier trained on Politeness text
"RuyuanWan/Politeness_RoBERTa_Text_Disagreement_Predictor" Disagreement predictor trained on Politeness text(regression)

Citation

If you find this work useful for your research, please cite our papers:

@article{wan2023everyone,
  title={Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information},
  author={Wan, Ruyuan and Kim, Jaehyung and Kang, Dongyeop},
  journal={arXiv preprint arXiv:2301.05036},
  year={2023}
}