/Awesome-Table-QA

A comprehensive paper list of Table-based Question Answering.

Awesome Table Question Answering

Awesome

🔥🔥🔥 An awesome paper list of Table-based Question Answering.

Paper

Dataset

Single-Turn

  1. Compositional Semantic Parsing on Semi-Structured Tables WikiTableQuestions 2015

    [Paper] [Code] EPanupong Pasupat, Percy Liang

  2. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning WikiSQL 2017

    [Paper] [Code] Victor Zhong, Caiming Xiong, Richard Socher

  3. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task Spider EMNLP 2018

    [Paper] [Code] Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev

  4. On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries SQUALL EMNLP-Findings 2020

    [Paper] [Code] Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daumé III, Lillian Lee

  5. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data HybridQA EMNLP-Findings 2020

    [Paper] [Code]Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, William Yang Wang

  6. TSQA: tabular scenario based question answering GeoTSQA AAAI 2021

    [Paper] [Code]Xiao Li, Yawei Sun, Gong Cheng

  7. TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance TAT-QA ACL 2021

    [Paper] [Code]Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, Tat-Seng Chua

  8. Open Domain Question Answering over Tables via Dense Retrieval NQ-table NAACL 2021

    [Paper] [Code]Jonathan Herzig, Thomas Müller, Syrine Krichene, Julian Eisenschlos

  9. Open Question Answering over Tables and Text OTT-QA ICLR 2021

    [Paper] [Code]Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen

  10. MultiModalQA: complex question answering over text, tables and images MultimodalQA ICLR 2021

    [Paper] [Code]Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant

  11. Finqa: A dataset of numerical reasoning over financial data FinQA EMNLP 2021

    [Paper] [Code] Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

  12. HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation HiTab ACL 2022

    [Paper] [Code]Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang

  13. FeTaQA: Free-form Table Question Answering FeTaQA TACL 2022

    [Paper] [Code]*Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Hailey Schoelkopf, Riley Kong, Xiangru Tang, Mutethia Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev, Dragomir Radev

  14. MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data MultiHiertt ACL 2022

    [Paper] [Code]Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang

  15. Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning TAT-HQA ACL 2022

    [Paper]Moxin Li, Fuli Feng, Hanwang Zhang, Xiangnan He, Fengbin Zhu, Tat-Seng Chua

  16. Towards Complex Document Understanding By Discrete Reasoning TAT-DQA ACM MM 2022

    [Paper]Fengbin Zhu, Wenqiang Lei, Fuli Feng, Chao Wang, Haozhou Zhang, Tat-Seng Chua

  17. AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry AIT-QA NAACL 2022

    [Paper] [Code]Yannis Katsis, Saneem Chemmengath, Vishwajeet Kumar, Samarth Bharadwaj, Mustafa Canim, Michael Glass, Alfio Gliozzo, Feifei Pan, Jaydeep Sen, Karthik Sankaranarayanan, Soumen Chakrabarti

  18. ToTTo: A Controlled Table-To-Text Generation Dataset ToTTo EMNLP 2020

    [Paper] [Code]Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

  19. Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table Open-Wikitable ACL-Findings 2023

    [Paper] Sunjun Kweon, Yeonsu Kwon, Seonhee Cho, Yohan Jo, Edward Choi

Multiple-Turn

  1. PACIFIC: Towards proactive conversational question answering over tabular and textual data in finance Pacific EMNLP 2022

    [Paper] [Code]Yang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, Tat-Seng Chua

  2. ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering ConvFinQA EMNLP 2022

    [Paper] [Code]Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

  3. HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data HybriDialogue EMNLP-Findings 2022

    [Paper] [Code]Kai Nakamura, Sharon Levy, Yi-Lin Tuan, Wenhu Chen, William Yang Wang

  4. MMCoQA: Conversational Question Answering over Text, Tables, and Images MMCoQA ACL 2022

    [Paper] [Code]Yongqi Li, Wenjie Li, Liqiang Nie

  5. CoQA: A Conversational Question Answering Challenges CoQA TACL 2019

    [Paper] [Code]Siva Reddy, Danqi Chen, Christopher D. Manning

Methods

Table Pretraining (TaLMs)

  1. TAPEX: Table pre-training via learning a neural SQL executor ICLR 2022

    WikiSQL, WikiTableQuestions, SQA

    [Paper] [Code]Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou

  2. OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering NAACL 2022

    WikiSQL, WikiTableQuestions

    [Paper] [Code]Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen

  3. ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples EMNLP 2022

    WikiSQL, WikiTableQuestions

    [Paper] [Code]Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev

  4. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data ACL 2020

    WikiTableQuestions, Spider

    [Paper] [Code]Pengcheng Yin, Graham Neubig, Wen-tau Yih, Sebastian Riedel

  5. MATE: Multi-view Attention for Table Transformer Efficiency EMNLP 2021

    WikiTableQuestions, HybridQA

    [Paper]Julian Martin Eisenschlos, Maharshi Gor, Thomas Müller, William W. Cohen

LLM-based Methods

  1. Binding Language Models in Symbolic Languages ICLR 2023

    WikiSQL, WikiTableQuestions, MultimodalQA

    [Paper] [Code]Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

  2. Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning SIGIR 2023

    WikiSQL, WikiTableQuestions

    [Paper] Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li

  3. Large language models are few (1)-shot table reasoners EACL-Findings 2023

    WikiTableQuestions, FetaQA

    [Paper] Wenhu Chen

  4. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks Arxiv 2023

    WikiTableQuestions, FetaQA

    [Paper] Wenhu Chen, Xueguang Ma, Xinyi Wang, William W Cohen

  5. Structgpt: A general framework for large language model to reason over structured data Arxiv 2023

    WikiSQL, WikiTableQuestions

    [Paper] Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Wayne Xin Zhao, Ji-Rong Wen

  6. LEVER: Learning to Verify Language-to-Code Generation with Execution ICML 2023

    WikiTableQuestions

    [Paper] Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida Wang, Xi Victoria Lin

  7. Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data

    WikiTableQuestions

    [Paper] Carlos Gemmell, Jeffrey Dalton

Retrieval-then-Read Methods

Multi-hop
  1. MATE: Multi-view Attention for Table Transformer Efficiency EMNLP 2021

    WikiTableQuestions, HybridQA

    [Paper] Julian Martin Eisenschlos, Maharshi Gor, Thomas Müller, William W. Cohen

  2. Multi-Row, Multi-Span Distant Supervision For Table+Text Question Answering MITQA ACL 2023

    HybridQA, OTT-QA

    [Paper] Vishwajeet Kumar, Yash Gupta, Saneem Chemmengath, Jaydeep Sen, Soumen Chakrabarti, Samarth Bharadwaj, Feifei Pan

  3. Reasoning over hybrid chain for table-and-text open domain question answering CARP IJCAI 2022

    OTT-QA

    [Paper] Wanjun Zhong, Junjie Huang, Qian Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

  4. Multi-hop open-domain question answering over structured and unstructured knowledge DEHG NAACL-Findings 2022

    HybridQA

    [Paper] Yue Feng, Zhen Han, Mingming Sun, Ping Li

  5. Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA OTTeR EMNLP-Findings 2022

    OTT-QA

    [Paper] Junjie Huang, Wanjun Zhong, Qian Liu, Ming Gong, Daxin Jiang, Nan Duan

  6. MuGER2: Multi-Granularity Evidence Retrieval and Reasoning for Hybrid Question Answering MuGER EMNLP-Findings 2022

    HybridQA

    [Paper] Yingyao Wang, Junwei Bao, Chaoqun Duan, Youzheng Wu, Xiaodong He, Tiejun Zhao

  7. TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering TACR ACL-Findings 2023

    HybridQA

    [Paper] Jian Wu, Yicheng Xu, Yan Gao, Jian-Guang Lou, Börje F. Karlsson, Manabu Okumura

  8. MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data MAFiD EACL-Findings 2023

    HybridQA

    [Paper] Sung-Min Lee, Eunhwan Park, Daeryong Seo, Donghyeon Jeon, Inho Kang, Seung-Hoon Na

  9. S3HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering S3HQA ACL 2023

    HybridQA

    [Paper] Fangyu Lei, Xiang Li, Yifan Wei, Shizhu He, Yiming Huang, Jun Zhao, Kang Liu

Open-Domain
  1. Reasoning over hybrid chain for table-and-text open domain question answering CARP IJCAI 2022

    OTT-QA

    [Paper] Wanjun Zhong, Junjie Huang, Qian Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

  2. Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA OTTeR EMNLP-Findings 2022

    OTT-QA

    [Paper] Junjie Huang, Wanjun Zhong, Qian Liu, Ming Gong, Daxin Jiang, Nan Duan

  3. Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge CORE EMNLP-Findings 2022

    OTT-QA

    [Paper] Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

  4. Chain-of-Skills: A Configurable Model for Open-domain Question Answering CORE ACL 2023

    OTT-QA

    [Paper] Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

Numerical Reasoning
  1. Finqa: A dataset of numerical reasoning over financial data FinQANet EMNLP 2021

    FinQA

    [Paper] [Code] Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

  2. APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning APOLLO Arxiv 2023

    FinQA, ConvFinQA

    [Paper] Jiashuo Sun, Hang Zhang, Chen Lin, Yeyun Gong, Jian Guo, Nan Duan

  3. Dyrren: A dynamic retriever-reranker-generator model for numerical reasoning over tabular and textual data Dyrren AAAI 2023

    FinQA

    [Paper] Xiao Li, Yin Zhu, Sichen Liu, Jiangzhou Ju, Yuzhong Qu, Gong Cheng

  4. MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data MT2Net ACL 2022

    Multihiertt

    [Paper] [Code]Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang

  5. Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context MT2Net ACL-Findings 2023

    TAT-QA, TAT-HQA

    [Paper] Moxin Li, Wenjie Wang, Fuli Feng, Hanwang Zhang, Qifan Wang, Tat-Seng Chua

  6. NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering NAPG Arxiv 2023

    Multihiertt

    [Paper] Tengxun Zhang, Hongfei Xu, Josef van Genabith, Deyi Xiong, Hongying Zan

Multimodal Reasoning
  1. MultiModalQA: complex question answering over text, tables and images ImplicitDecomp ICLR 2021

    [Paper] [Code]Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant

  2. MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text MuRAG EMNLP 2022

    MultimodalQA

    [Paper] Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, William Cohen

  3. Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills SKURG ACL 2022

    MultimodalQA

    [Paper] Ori Yoran, Alon Talmor, Jonathan Berant

Non-Retrieval Methods

  1. TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data EMNLP 2022

    TAT-QA, WikiTableQuestions

    [Paper] Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Fan Cheng, Shi Han, Dongmei Zhang

  2. Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder COLING 2022

    TAT-QA

    [Paper] Fangyu Lei, Shizhu He, Xiang Li, Jun Zhao, Kang Liu

  3. UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation EMNLP 2022

    TAT-QA

    [Paper] Yongwei Zhou, Junwei Bao, Chaoqun Duan, Youzheng Wu, Xiaodong He, Tiejun Zhao

  4. Multi-View Graph Representation Learning for Answering Hybrid Numerical Reasoning Question Arxiv 2023

    TAT-QA

    [Paper] Yifan Wei, Fangyu Lei, Yuanzhe Zhang, Jun Zhao, Kang Liu

Existing Survey

  1. A survey on table question answering: recent advances 2022

    [Paper]Nengzheng Jin, Joanna Siebert, Dongfang Li, Qingcai Chen

  2. A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions 2022.12

    [Paper]Dingzirui Wang, Longxu Dou, Wanxiang Che

  3. A Survey on Neural Data-to-Text Generation

    [Paper]Yupian Lin, Tong Ruan, Jingping Liu, Haofen Wang

  4. A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

    [Paper]Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, Yongbin Li