unifee: A Python repository from WilliamZR

Code for Unified Evidence Extraction for Fact Verification over Tables and Texts at EACL2023.

Update

Results for each evidence extraction is now available!

You can download them at google drive link.

Shared Task

The majority of the base-line code and its documentation remains intact. We extend it by adding scripts to incorporate neural re-ranking models.

Install Requirements

Create a new Conda environment and install torch:

conda env freate -f feverous.yaml

Download PLM checkpoints

Download RoBERTa TAPAS from huggingface

Prepare Data

Call the following script to download the FEVEROUS data:

./scripts/download_data.sh

Or you can download the data from the FEVEROUS dataset page directly. Namely:

Training Data
Development Data
Test Data
Wikipedia Data as a database (sqlite3)

unpack the given downloaded data to DCUF_code/data/, and rename them to

train.jsonl
dev.jsonl
test.jsonl.bk
feverous_wikiv1.db

Running the Code

prepare the test data

add an id to each test case to make it have the same format as other splits

cd src/my_script/
python add_id_to_test_set.py

Page Retriever

See src/my_methods/bm25_doc_retriever/readme.md for the Page Retriever step

Sentence and Table Evidence Retrieval

The top l sentences and q tables of the selected pages are then scored separately using TF-IDF. We set l=5 and q=3.

Extract sentence evidence

PYTHONPATH=src python  src/my_methods/roberta_sentence_selector/train_roberta_sentence_selector.py --bert_name {bert_name} > log_graph_sentence_selector_large.txt &
PYTHONPATH=src python  src/my_methods/roberta_sentence_selector/pred_sentence_scores.py --test_ckpt {test_ckpt}

Extract table evidence

PYTHONPATH=src python src/baseline/retriever/sentence_tfidf_drqa.py --db data/feverous_wikiv1.db --max_page 5 --max_sent 5 --use_precomputed false --data_path data/ --split {split}
PYTHONPATH=src python src/baseline/retriever/table_tfidf_drqa.py --db data/feverous_wikiv1.db --max_page 5 --max_tabs 3 --use_precomputed false --data_path data/ --split {split}

check the results of table evidence

PYTHONPATH=src python src/baseline/retriever/eval_tab_retriever.py --max_page 5 --max_tabs 3 --split {split}

Check the results of sentence evidence

PYTHONPATH=src python src/baseline/retriever/eval_sentence_retriever.py --max_page 5 --max_sent 5 --split {split}

Combine both retrieved sentences and tables into one file:

PYTHONPATH=src python src/baseline/retriever/combine_retrieval.py --data_path data --max_page 5 --max_sent 5 --max_tabs 3 --split {split}

Evaluate combined results:

PYTHONPATH=src python src/baseline/retriever/eval_combined_retriever.py --max_page 5 --max_sent 5 --max_tabs 3 --split {split}

Build dataset, prepare graphs for each split

PYTHONPATH=src python src/my_methods/graph_evidence_extraction/all_cell_util.py --split {split}

Train

PYTHONPATH=src python src/my_methods/graph_evidence_extraction/train_fusion_col_extractor.py --lr 1e-6 --batch_size 4 --print_freq 100   --use_entity_edges --max_epoch 3 --use_all_cells

Run model on dataset and save scores to output_path

PYTHONPATH=src python src/my_methods/graph_evidence_extraction/rerank_evidence.py  --output_path {} --model_load_path {} --use_entity_edges

Retrieve evidence set with threshold from computed score

PYTHONPATH=src python src/my_methods/graph_evidence_extraction/evidence_retrieval_with_scores.py --split {} --cell_threshold {} --sent_threshold {}

Verdict Prediction please refer to our previous work DCUF

WilliamZR/unifee