/Salience-aware-Learning

[EMNLP 2021] Table-based Fact Verification with Salience-aware Learning.

Primary LanguagePythonApache License 2.0Apache-2.0

Salience-aware-Learning

Code for our paper Table-based Fact Verification with Salience-aware Learning at EMNLP 2021 Findings.

Installation

pip install -r requirements.txt

Install pytorch_scatter.

Data

We conduct experiments on the TabFact dataset. The statements in officially released train/val/test set are lemmatized. We use the raw (unlemmatized) statements. More discussion can be found in this issue.

Download the train/val/test set to ./data.

Download the table set to ./data/tables.

To convert raw data to model inputs:

cd data
python preprocess.py

Token Salience Detection

cd token_salience
  • First, run bash run_origin.sh to get predictions for original inputs.
  • Second, run bash run_masked.sh to get predictions for inputs with masked tokens.
  • Third, run python calculate_salience.py to get salience scores by comparing the outputs of last two steps.
  • Finally, run python add_salience_to_data.py to merge the salience scores into input data.

Non-salient Token Replacement

cd token_replacement
  • First, run bash run_mlm.sh to get predictions for replacing non-salient tokens.
  • Second, run python add_token_replacement.py to merge the token replacement candidates into input data.

Joint Fact Verification and Salient Token Prediction

cd joint_model
bash run_joint_model.sh

Citing

@inproceedings{wang-etal-2021-table-based,
    title = "Table-based Fact Verification With Salience-aware Learning",
    author = "Wang, Fei  and
      Sun, Kexuan  and
      Pujara, Jay  and
      Szekely, Pedro  and
      Chen, Muhao",
    booktitle = "EMNLP - findings",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.338",
    pages = "4025--4036"
}