Awesome Reasoning over Tables Awesome

A comprehensive paper list of awesome reasoning over tables.

Table of Contents

Table Representation Learning

Graph Representation

  • A Graph Representation of Semi-structured Data for Web Question Answering (COLING 2020) [Paper]
  • Retrieving Complex Tables with Multi-Granular Graph Representation Learning (SIGIR 2021) [Paper][Github]

Transformer-Based / Pre-training

2020

  • TaBERT: Learning Contextual Representations for Natural Language Utterances and Structured Tables (ACL 2020) [Paper][Github]
  • TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL 2020) [Paper][Github]

2021

  • TABBIE: Pretrained Representations of Tabular Data (NAACL 2021) [Paper][Github]
  • Capturing Row and Column Semantics in Transformer Based Question Answering over Tables (NAACL 2021) [Paper][Github]
  • GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (ICLR 2021) [Paper][Huggingface]
  • TUTA: Tree-based Transformers for Generally Structured Table Pre-training (KDD 2021) [Paper]
  • MATE: Multi-view Attention for Table Transformer Efficiency (EMNLP 2021) [Paper][Github]
  • Understanding tables with intermediate pre-training (EMNLP-findings 2021) [Paper][Github]
  • UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models (EMNLP 2022) [Paper][Github]
  • TAPEX: Table Pre-training via Learning a Neural SQL Executor (ICLR 2022) [Paper][Github]

2022

  • TableFormer: Robust Transformer Modeling for Table-Text Encoding (ACL 2022) [Paper][Github]
  • OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering (NAACL 2022) [Paper][Github]
  • REASTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples (EMNLP 2022) [Paper][Github]
  • Table-To-Text generation and pre-training with TabT5 (EMNLP 2022-findings) [Paper]
  • STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing (EMNLP 2022) [Paper]

Potential of Large LMs / Chain-of-Thought

2022

  • Large Language Models are few(1)-shot Table Reasoners (Pre-print) [Paper]

Controllable Table-to-Text Generation

Datasets/Benchmark

2020

  • Logical Natural Language Generation from Open-Domain Tables (ACL 2020) [Paper][Github]
  • ToTTo: A Controlled Table-To-Text Generation Dataset (EMNLP 2020) [Paper][Github]

2021

  • SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables (NIPS 2021) [Paper][Github]
  • HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation (Pre-print 2021) [Paper]
  • TWT: Table with Written Text for Controlled Data-to-Text Generation (EMNLP-findings 2021) [Paper][Github]

Methodology

2021

  • Towards Table-to-Text Generation with Numerical Reasoning (ACL 2021) [Paper]
  • De-Confounded Variational Encoder-Decoder for Logical Table-to-Text Generation (ACL 2021) [Paper]
  • Few-Shot Table-to-Text Generation with Prototype Memory (EMNLP-findings 2021) [Paper]
  • Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots (EMNLP-findings 2021) [Paper][Github]

2022

  • Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning (NAACL 2022) [Paper][Github]
  • R2D2: Robust Data-to-Text with Replacement Detection (EMNLP 2022) [Paper][Github]
  • PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation (EMNLP 2022) [Paper][Github]
  • Diversity Enhanced Table-to-Text Generation via Type Control (Pre-print) [Paper]

Reasoning over Tabular Data

Datasets/Benchmark

2020

  • TabFact: A Large-scale Dataset for Table-based Fact Verification (ICLR 2020) [Paper][Github]

2021

  • FeTaQA: Free-form Table Question Answering (TACL 2021) [Paper][Github]
  • HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation (Pre-print 2021) [Paper]

Methodology

2021

  • Joint Verification and Reranking for Open Fact Checking Over Tables (ACL 2021) [Paper][Github]
  • Logic-level Evidence Retrieval and Graph-based Verification Network for Table-based Fact Verification (EMNLP 2021) [Paper][Blank Github]
  • Exploring Decomposition for Table-based Fact Verification (EMNLP-findings 2021) [Paper]
  • Table-based Fact Verification With Salience-aware Learning (EMNLP-findings 2021) [Paper][Github]

2022

  • Learning to Generate Programs for Table Fact Verification via Structure-Aware Semantic Parsing (ACL 2022) [Paper][Github]

Reasoning over Tabular and Textual Data (Hybrid Data)

Datasets/Benchmark

2020

  • HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data (EMNLP-findings 2020) [Paper][Github]

2021

  • Open Question Answering over Tables and Text (ICLR 2021) [Paper][Github]
  • MultiModalQA: complex question answering over text, tables and images (ICLR 2021) [Paper]
  • TSQA: Tabular Scenario Based Question Answering (AAAI 2021) [Paper][Github]
  • TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance (ACL 2021) [Paper][Github]
  • FinQA: A Dataset of Numerical Reasoning over Financial Data (EMNLP 2021) [Paper][Github]
  • FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information (NeurIPS-benchmark 2021) [Paper][Github]

2022

  • MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data (ACL 2022) [Paper][Github]
  • Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning (ACL 2022) [Paper]
  • HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data (ACL-findings 2022) [Paper][Github]

Methodology

2021

  • Open Domain Question Answering over Tables via Dense Retrieval (NAACL 2021) [Paper][Github]

2022

  • FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports (LREC 2022) [Paper]
  • Towards Complex Document Understanding By Discrete Reasoning (MM 2022) [Paper][Github]
  • Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder (COLING 2022) [Paper][Github]
  • TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data (EMNLP 2022) [Paper][Github]
  • MuGER2: Multi-Granularity Evidence Retrieval and Reasoning for Hybrid Question Answering (EMNLP-findings 2022) [Paper][Github]

Other directions

Robustness

  • Towards Robustness of Text-to-SQLs against Synonym Substitution (ACL 2021) [Paper][Github]
  • Topic Transferable Table Question Answering (EMNLP 2021) [Paper][Github]
  • Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion (ACL 2022) [Paper][Github]
  • Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness (ICLR 2023 submission) [Paper]

Tutorials

2021

  • KDD 2021 Tutorial: From Tables to Knowledge: Recent Advances in Table Understanding [Website]
  • EMNLP 2021 Tutorial: Knowledge-Enriched Natural Language Generation [Website]

Contributing

Please feel free to make a pull request or email Yilun Zhao (yilun.zhao@yale.edu) for any interesting updates.