This repository provides datasets and code for preprocessing, training and testing models for quantifying annotation disagreement with the official Hugging Face implementation of the following paper:
Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
Ruyuan Wan, Jaehyung Kim, Dongyeop Kang
AAAI 2023
Our code is mainly based on HuggingFace's transformers
libarary.
The following command installs all necessary packages:
pip install -r requirements.txt
The project was tested using Python 3.7.
We uploaded both our datasets and model checkpoints to Hugging Face's repo. You can directly load our data using datasets
and load our model using transformers
.
# load our dataset
from datasets import load_dataset
dataset = load_dataset("RuyuanWan/SBIC_Disagreement")
# you can replace "SBIC_Disagreement" to "SChem_Disagreement", "Dilemmas_Disagreement", "Dynasent_Disagreement" or "Politeness_Disagreement" to change datasets
# load our model
from simpletransformers.classification import ClassificationModel, ClassificationArgs
model_args = ClassificationArgs()
model_args.regression = True
SBIC_person_demo_col_regression = ClassificationModel(
"roberta",
"RuyuanWan/SBIC_RoBERTa_Demographic-text_Disagreement_Predictor",
num_labels=1,
args=model_args
)
# you can replace "SBIC_RoBERTa_Demographic-text_Disagreement_Predictor" to other pretrained models
#predict
# you can replace example text to other random examples.
text_example1 = ['Abortion should be legal']
predict1, raw_outputs1 = SBIC_person_demo_col_regression.predict(text_example1)
print(predict1)
We also provided a simple demo code for how to use them to predict disagreement.
We used public datasets of subjective tasks that contain annotators’ voting records from their original raw dataset
- Social Bias Corpus(Sap et al. 2020)
- Social Chemistry 101(Forbes et al. 2020)
- Scruples-dilemmas(Lourie, Bras, and Choi 2021)
- Dyna-Sentiment(Potts et al. 2021)
- Wikipedia Politeness(Danescu-Niculescu-Mizil et al. 2013)
You can load our processed version of disagreement datasets using Hugging Face's datasets
, and you can also download the disagreement datasets in datasets/
Here are the five datasets with disagreement labels. You can change the following data specifications in using Hugging Face's datasets
:
Dataset name in Hugging Face | Dataset information |
---|---|
"RuyuanWan/SBIC_Disagreement" | SBIC dataset with disagreement labels |
"RuyuanWan/SChem_Disagreement" | SChem dataset with disagreement labels |
"RuyuanWan/Dilemmas_Disagreement" | Dilemmas dataset with disagreement labels |
"RuyuanWan/Dynasent_Disagreement" | Dynasent dataset with disagreement labels |
"RuyuanWan/Politeness_Disagreement" | Politeness dataset with disagreement labels |
In our disagreement prediction experiments, we compared:
- Binary v.s. continous disagreement labels,
- Only text input v.s. text with annotator's demographic information,
- Text with group-wise annotator's demographic information v.s. text with personal level annotator's demographic information.
Here are the different models that we stored at Hugging Face.
Model name in Hugging Face | Model information |
---|---|
"RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Binary_Classifie" | Binary diagreement classifier trained on SBIC text |
"RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor" | Disagreement predictor trained on SBIC text(regression) |
"RuyuanWan/SBIC_RoBERTa_Demographic-text_Disagreement_Predictor" | Disagreement predictor trained on SBIC text and individual annotator's demographic information in colon templated format |
"RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier" | Binary diagreement classifier trained on SChem text |
"RuyuanWan/SChem_RoBERTa_Text_Disagreement_Predictor" | Disagreement predictor trained on SChem text(regression) |
"RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor" | Disagreement predictor trained on Schem text and individual annotator's demographic information in colon templated format |
"RuyuanWan/Dilemmas_RoBERTa_Text_Disagreement_Binary_Classifier" | Binary diagreement classifier trained on Dilemmas text |
"RuyuanWan/Dilemmas_RoBERTa_Text_Disagreement_Predictor" | Disagreement predictor trained on Dilemmas text(regression) |
"RuyuanWan/Dynasent_RoBERTa_Text_Disagreement_Binary_Classifier" | Binary diagreement classifier trained on Dilemmas text |
"RuyuanWan/Dynasent_RoBERTa_Text_Disagreement_Predictor" | Disagreement predictor trained on Dynasent text(regression) |
"RuyuanWan/Politeness_RoBERTa_Text_Disagreement_Binary_Classifier" | Binary diagreement classifier trained on Politeness text |
"RuyuanWan/Politeness_RoBERTa_Text_Disagreement_Predictor" | Disagreement predictor trained on Politeness text(regression) |
If you find this work useful for your research, please cite our papers:
@article{wan2023everyone,
title={Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information},
author={Wan, Ruyuan and Kim, Jaehyung and Kang, Dongyeop},
journal={arXiv preprint arXiv:2301.05036},
year={2023}
}