This repository open-sources the code and datas used in our paper「Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration」
Please cite our paper and kindly give a star for this repository if you use our code or data.
Seeing in requirement.txt
You could using pip install -r requirement.txt
to install the required packages.
Download your needed model weights into model_state
or remove all model_state/
dir prefix in all config files in configs
to automatically download the model weights.
Download the Sem16, P-stance, and VAST or other stance detection dataset, place them into dataset/<dataset name>
Process the datasets into the following format:
# Each file is a csv file, containing at least the three keys 'Tweet', 'Target', 'Stance'
- datasets
- <dataset name>
- in-target
- <target name>
- train.csv
- valid.csv
- test.csv
- <target name>
- ...
- zero-shot
- <target name>
- train.csv
- valid.csv
- test.csv
- <target name>
- ...
- <dataset name>
- ...
The way of how I process the datasets is shown in datasets/preprocess_datasets.py
sh scripts/run_FACTUAL.sh
Take in-target stance detection on p-stance for example
>>> sh scripts/run_FACTUAL.sh
>>> input training dataset: [sem16, p_stance, vast]: p_stance
>>> input train dataset mode: [in_target, zero_shot]: in_target
>>> input model framework: [rationale, cad]: cad
>>> input llm name: [gpt, llama]: gpt
>>> input model name: [bert_base, roberta_base, bertweet_base, robert_base_sentiment, kebert]: roberta_base
>>> input running mode: [sweep, wandb, normal]: normal
>>> input training cuda idx: Your Cuda index