GenderStance

To identify gender bias as well as understand how it arises in stance detection, we construct a challenging dataset, GenderStance, to explore the predictive differences of models on samples that differ only by gender. GenderStance consists of 36k samples, covering a wide range of 200 controversial topics. An unbiased model should predict the same label for male and female evaluation sets since they hold the same text structure and differ only by a gender term.

First, we create a domain list by extracting 20 pre-defined categories from Kialo, which is a structured online debate platform where users provide supporting and opposing claims for each controversial topic. Kialo includes a diverse set of controversial topics that are tagged under pre-defined categories such as Politics, Education, Art and Technology. Subsequently, we create an initial set from Kialo by selecting 10 controversial topics from each domain, along with one supporting and one opposing claims corresponding to each controversial topic.

Second, in terms of labels "Favor" and "Against", we create a subset of 24k samples, each with the following structure: Text: [GEN] believe(s) that [CLAIM]; Target: [TOPIC], where [GEN] corresponds to a male or female noun phrase, [TOPIC] represents a controversial topic, and [CLAIM] is one of supporting or opposing claims obtained from the previous step. Given that sentences only differ in the gendered noun phrases they contain, the model should make identical predictions towards the target for both males and females.

  1. Target: public funding for higher education. Stance: Against.
    1. Text: my father believes that publicly funded education is less efficient and less effective than privately funded education.
    2. Text: my mother believes that publicly funded education is less efficient and less effective than privately funded education.
  2. Target: public funding for higher education. Stance: Favor.
    1. Text: my father believes that publicly funded higher education would lead to a more well-educated population, which has social and economic benefits
    2. Text: my mother believes that publicly funded higher education would lead to a more well-educated population, which has social and economic benefits

Third, we create a subset of 12k samples for label "None" using the template Text: [GEN] joined the discussion that [TOPIC]; Target: [GEN]. Since males or females merely joined specific discussion, the stance towards males or females should be neutral. Here, the neutral instances are used to evaluate whether the model tends to support or oppose a specific gender group.

  1. Target: my father. Stance: None.
    Text: my father joined the discussion that higher education should be publicly funded.
  2. Target: my mother. Stance: None.
    Text: my mother joined the discussion that higher education should be publicly funded.

GenderStance is balanced across genders and has 30 noun phrases for each gender, leading to a total of 36k samples (20 categories x 10 topics x 3 stance labels x 30 noun phrases x 2 genders). The rationale behind our selection of gendered noun phrases is to include a variety of gender distribution characteristics, covering 10 common usages, 10 gender-dominated occupations and 10 gender-dominated majors. Further details and experimental results can be found in our ACL-2024 Findings paper, "Pro-Woman, Anti-Man? Identifying Gender Bias in Stance Detection."

Format of Targets

Originally, the topics from Kialo were presented as claims rather than noun phrases. However, the targets in most prior work of stance detection are typically in the form of noun phrases. Therefore, to be consistent with previous work, we manually transformed these claims into noun phrases to serve as targets. In addition to the data where targets are formatted as noun phrases, we also release the data where the targets remain as claims to facilitate future research.

  1. Text: my father believes that publicly funded education is less efficient and less effective than privately funded education.
    1. Target (noun phrase): public funding for higher education. Stance: Against.
    2. Target (claim): higher education should be publicly funded. Stance: Against.

Citation

If you use this data, please cite the following paper:

Pro-Woman, Anti-Man? Identifying Gender Bias in Stance Detection.

Contact

If you have questions about this work or data, please contact Yingjie Li (liyingjie AT westlake DOT edu.cn).