ACL2020 Tutorial: Open-Domain Question Answering

This ACL2020 tutorial ~~will be~~ was held on July 5th, 2020, by Danqi Chen <danqic@cs.princeton.edu> and Scott Yih <scottyih@fb.com>. You can find all the tutorial materials below and the live video is available at the ACL website portal.

Overview

This tutorial provides a comprehensive and coherent overview of cutting-edge research in open-domain question answering (QA), the task of answering questions using a large collection of documents of diversified topics. We will start by first giving a brief historical background, discussing the basic setup and core technical challenges of the research problem, and then describe modern datasets with the common evaluation metrics and benchmarks. The focus will then shift to cutting-edge models proposed for open-domain QA, including two-stage retriever-reader approaches, dense retriever and end-to-end training, and retriever-free methods. Finally, we will cover some hybrid approaches using both text and large knowledge bases and conclude the tutorial with important open questions. We hope that the tutorial will not only help the audience to acquire up-to-date knowledge but also provide new perspectives to stimulate the advances of open-domain QA research in the next phase.

A more detailed introduction can be found here.

Tutorial Slides

Reading List

Early QA work

Answering English questions by computer: a survey. R.F.Simmons. 1965
The Structure and Performance of an Open-domain Question Answering System. Dan Moldovan, Sanda Harabagiu, Marius Pasca, Rada Mihalcea, Roxana Girju, Richard Goodrum, Vasile Rus. ACL 2000
An Analysis of the AskMSR Question-answering System. Eric Brill, Susan Dumais and Michele Banko. EMNLP 2002.
Open-Domain Question–Answering. John Prage. 2007
An Exploration of the Principles Underlying Redundancy-Based Factoid Question Answering. Jimmy Lin. 2007.
Building Watson: An Overview of the DeepQA Project. David Ferrucci, Eric Brown, Jennifer Chu-Carroll et al. 2010

Recent work (2017+)

Reading Wikipedia to Answer Open-Domain Questions. Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes. ACL 2017.
R^3: Reinforced Reader-Ranker for Open-Domain Question Answering. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang. AAAI 2018.
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell. ICLR 2018.
Denoising Distantly Supervised Open-domain Question Answering. Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun. ACL 2018.
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text. Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, William Cohen. EMNLP 2018.
Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. OpenAI 2019.
End-to-end Open-domain Question Answering with BERTserini. Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin. NAACL 2019 (demonstration).
Latent Retrieval for Weakly Supervised Open Domain Question Answering. Kenton Lee, Ming-Wei Chang, Kristina Toutanova. ACL 2019.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index. Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi. ACL 2019.
Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader. Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang. ACL 2019.
A Discrete Hard EM Approach for Weakly Supervised Question Answering. Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer. EMNLP 2019.
Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang. EMNLP 2019.
PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text. Haitian Sun, Tania Bedrax-Weiss, William W. Cohen. EMNLP 2019.
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering. Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi. arXiv 2019.
Contextualized Sparse Representations for Real-Time Open-Domain Question Answering. Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang. ACL 2020.
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering. Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. ICLR 2020.
REALM: Retrieval-Augmented Language Model Pre-Training. Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang. ICML 2020.
Dense Passage Retrieval for Open-Domain Question Answering. Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih. arXiv 2020.
How Much Knowledge Can You Pack Into the Parameters of a Language Model?. Adam Roberts, Colin Raffel, Noam Shazeer. arXiv 2020.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela. arXiv 2020.
Language Models are Few-Shot Learners. Tom B. Brown, Benjamin Mann, Nick Ryder et al. arXiv 2020.

We understand this is a long reading list :) In case you wonder where you should start with, we plan to discuss the papers in bold in depth during our tutorial.

jinfengr/acl2020-openqa-tutorial