/tata

Github Repository for TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Primary LanguagePythonApache License 2.0Apache-2.0

TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Github Repository for TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Full Paper: https://aclanthology.org/2023.emnlp-main.694/

Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771-score on the Zero-shot VAST dataset).

Topic-Aware (TAW) Dataset

Within this work, utilizing a dataset of news articles from 3,074 news websites, the MPNet model, the Parrot paraphrase, and Flan-T5-XL, we extract and pair paragraphs with similar topics from different websites for use in training a Topic-Aware (TAW) encoding model. We supply both an extended dataset of 238,228 (where there are no more than 1,000 paragraphs from any one given website) and an unfiltered dataset of 984,539 (an unrestricted number of paragraphs from any given website). To request either (or both) dataset please fill out this Google form. This dataset may only be utilized for research purposes, the copyright of the articles within this dataset belongs to the respective websites.

Topic-Agnostic (TAG) Dataset

In order to initially train a dataset of topic-agnostic encoding layer for use in our stance detection model, we extended the original VAST dataset using the Dipper Paraphraser. You can download the extended VAST/TAG dataset, at the following link. As in the original VAST dataset 0=against, 1=pro, 2=neutral.

Request TATA Model Weights

In this work, we benchmark three different models, a Topic-Agnostic model (TAG), a Topic-Aware model (TAW), and a model that incorporates both the TAG and TAW models named TATA. To request the weights for these models, please fill out the following Google form.

Citing the paper

If you use the code or datasets from this apper, you can cite us with the following BibTex entry:

@inproceedings{hanley2023tata,
    title={{TATA}: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings},
    author={Hanley, Hans W. A. and Durumeric, Zakir},
    booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
    year={2023},
    url={https://openreview.net/forum?id=J9Vx7eTuWb}
  }

License and Copyright

Copyright 2024 The Board of Trustees of The Leland Stanford Junior University

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.