Datasets and scripts for the ACL2024 Findings paper: "Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset".
Column Name | Description |
---|---|
ID | problem ID |
ORIGINAL_ID | (INTERNAL) original problem ID |
premises_ja | two premises in Japanese |
hypothesis_ja | one hypothesis in Japanese |
premises_en | two premises in English |
hypothesis_en | one hypothesis in English |
gold | correct answer, the relationship of the hypothesis to the premises (entailment, contradiction, neutral) |
mood | the form of each premise and conclusion (three letters composed of A, E, I and O) |
inference-type | type of logical inferences (syllogism, propositional) |
content-type | classification based on belief congruency (symbolic, congruent, incongruent) |
conversion | associated with conversion error (yes, no) |
atmosphere | associated with atmosphere effect (yes, no) |
- See our paper for details on content-type, inference-type, conversion, and atmosphere.
Column Name | Description |
---|---|
ID | problem ID |
premises_ja | two premises in Japanese |
hypothesis_ja_1 | hypothesis 1 in Japanese |
hypothesis_ja_2 | hypothesis 2 in Japanese |
hypothesis_ja_3 | hypothesis 3 in Japanese |
hypothesis_ja_4 | hypothesis 4 in Japanese |
hypothesis_ja_5 | hypothesis 5 in Japanese |
premises_en1 | two premises in English |
hypothesis_en_1 | hypothesis 1 in English |
hypothesis_en_2 | hypothesis 2 in English |
hypothesis_en_3 | hypothesis 3 in English |
hypothesis_en_4 | hypothesis 4 in English |
hypothesis_en_5 | hypothesis 5 in English |
gold | correct answer (1-5) |
content-type | classification based on belief congruency (symbolic, contentual, congruent, incongruent) |
mood | the form of each premise and conclusion (three letters composed of A, E, I and O) |
figure | code for the order in which each term appears (1-4) |
- NOTE: One of the five hypotheses is "none of them".
data/naloma2023/NeuBAROCO_NALOMA.tsv
- Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases (Ando et al., NALOMA-WS 2023)
git clone https://github.com/kmineshima/NeuBAROCO
cd NeuBAROCO
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export OPENAI_API_KEY=<YOUR_KEY> # For OpenAI API
export HUGGINGFACE_API_KEY=<YOUR_KEY> # For HuggingFace Inference Endpoints API
python -m scripts.experiments.acl2024 --help
Example:
python -m scripts.experiments.acl2024 nli --test_n=all --lang en ja --model gpt-3.5-turbo-1106 gpt-4-0613
Example:
python -m scripts.experiments.acl2024 choice5 --test_n=all --lang en ja --model gpt-3.5-turbo-1106 gpt-4-0613
If you use this data in any published research, please cite the following:
- ACL Anthology: Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset (Ozeki et al., Findings 2024)
- arXiv preprint
@inproceedings{ozeki-etal-2024-exploring,
title = "Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the {N}eu{BAROCO} Dataset",
author = "Ozeki, Kentaro and
Ando, Risako and
Morishita, Takanobu and
Abe, Hirohiko and
Mineshima, Koji and
Okada, Mitsuhiro",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.950",
pages = "16063--16077",
}