This repository contains supplementary materials for the paper "It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning" (Findings of ACL 2021).
To run the experiments, you will need the Hugging Face transformers
library, any version starting from 4.0.1 should work. Also, the code uses nltk
, torch
, numpy
, and scikit-learn
.
For specific dependency versions, you can use requirements.txt
To run the supervised finetuning baselines, you will also need to install Apex.
- Use the input file dataset.tsv provided.
- To compute the MAS baseline, run:
python calculate_MAS_score_patched.py --model bert-base-multilingual-uncased --input_file dataset.tsv > results.mbert.MAS.txt
python calculate_MAS_score_patched.py --model xlm-roberta-large --input_file dataset.tsv > results.xlmr.MAS.txt
- To compute several unsupervised baselines, run:
python calculate_baselines.py --model xlm-roberta-large --input_file dataset.tsv > results.xlmr.baselines.txt
python calculate_baselines.py --model bert-base-multilingual-uncased --input_file dataset.tsv > results.mbert.baselines.txt
- To run the supervised baselines, proceed to the supervised_baselines directory and run the following scripts:
For multilingual BERT: bash run_mbert.sh
For XLM-R: bash run_xlm_roberta.sh
These baselines use code from the bert-commonsense repository for the paper "A Surprisingly Robust Trick for Winograd Schema Challenge" (Kocijan et al., 2019).
- Finally, to calculate scores of the proposed method using multilingual BERT, run:
python dump_attns.py --model bert-base-multilingual-uncased --input_file dataset.tsv --output_file dump.mbert.attn.tsv
mkdir splits_mbert
python make_splits.py --input_file dump.mbert.attn.tsv --output_dir splits_mbert
python calculate_scores_on_splits.py splits_mbert result_scores.mbert.tsv
- Same, with the XLM-Roberta model:
python dump_attns.py --model xlm-roberta-large --input_file dataset.tsv --output_file dump.xlmr.attn.tsv
mkdir splits_xlmr
python make_splits.py --input_file dump.xlmr.attn.tsv --output_dir splits_xlmr
python calculate_scores_on_splits.py splits_xlmr result_scores.xlmr.tsv
- To draw the attention visualization, use:
python draw_attns_map.py --model xlm-roberta-large --output_file map.html --input_file dataset.selected.tsv
XWINO is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities.
The datasets that comprise XWINO are:
- The original Winograd Schema Challenge (Levesque, 2012);
- Additional data from the SuperGLUE WSC benchmark (Wang et al., 2019);
- The Definite Pronoun Resolution dataset (Rahman and Ng, 2012) (accessed from https://github.com/Yre/wsc_naive);
- A collection of French Winograd Schemas (Amsili and Seminck, 2017);
- Japanese translation of Winograd Schema Challenge (柴田知秀 et al., 2015);
- Russian Winograd Schema Challenge (Shavrina et al., 2020);
- A collection of Winograd Schemas in Chinese;
- Winograd Schemas in Portuguese (Melo et al., 2019).
The columns of the TSV-formatted dataset are:
- A two-letter language code (ISO 639-1);
- Source dataset identifier
- English reference schema (if exists, else "?")
- Schema raw text
- JSON of NLTK-tokenized sentence text
- JSON with the reference pronoun specification:
[<raw text>, <token ids range>, <tokenized>]
- JSON with answer candidates specification:
[[<answer1 raw text>, <answer1 token ids range>, <answer1 tokenized>, <correct answer (binary)>], [<answer2 raw text>, <answer2 token ids range>, <answer2 tokenized>, <correct answer (binary)>]]
@misc{tikhonov2021heads,
title={It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning},
author={Alexey Tikhonov and Max Ryabinin},
year={2021},
eprint={2106.12066},
archivePrefix={arXiv},
primaryClass={cs.CL}
}