The GrepBiasIR
provides a set of bias-sensitive queries, namely the gender-neutral queries for which biases in their retrieval results are considered socially problematic. The queries cover 7 gender dimensions on topics such as physical capabilities and child care. Each query is also accompanied by one relevant and one non-relevant document, where each document is expressed in neutral, male, and female wording.
@inproceedings{krieg2022grep,
title={Grep-BiasIR: a dataset for investigating gender representation-bias in information retrieval results},
author={Krieg, Klara and Parada-Cabaleiro, Emilia and Medicus, Gertraud and Lesota, Oleg and Schedl, Markus and Rekabsaz, Navid},
booktitle={Proceeding of the 2023 ACM SIGIR Conference On Human Information Interaction And Retrieval (CHIIR)},
year={2022}
}
Preprint: https://arxiv.org/pdf/2201.07754.pdf
The dataset consists of the queries.csv file comprising all queries, and seven files with respective documents corresponding to the seven topics. The formatting of these files are explained below:
queries.csv:
q_id
- unique ID of the querycategory
- query category (one of seven)query
- query text
queries-documents_[CATEGORY].csv:
[CATEGORY] - one of the seven query categories
q_id
- unique ID of the queryd_id
- unique ID of the documentrelevant
- document to query relevance judgement (1 - relevant, 0 - not relevant)query
- query textdoc_title
- title of the documentdocument
- text of the documentcontent_gender
- gender indication inferred from the text of the document (F - female, M - male, N - neutral)exp_stereotype
- expected stereotype annotation
Using GrepBiasIR
, Kopeinik et al. (citation below) conduct a user study to observe and measure the potential biases of the search engines' users, when formulating queries on gender-sensitive topics. The dataset consisting of these formulated queries is available here: https://github.com/CPJKU/user-interaction-gender-bias-IR
@inproceedings{Kopeinik2023Show,
title={Show me a "Male Nurse"! How Gender Bias is Reflected in the Query Formulation of Search Engine Users},
author={Kopeinik, Simone and Mara, Martina and Ratz, Linda and Krieg, Klara and Schedl, Markus and Rekabsaz, Navid},
booktitle={Proceeding of the ACM Conference on Human Factors in Computing Systems (CHI),},
year={2023}
}