Main Repository for Muralidharan et al., "PlacesQA: Towards Automatic Answering of Questions on the Web"
PlacesQA: Towards Automatic Answering of Questions on the Web. Srikanth Muralidharan, Akash Abdu Jyothi, Nelson Nauata, Fred Tung, Greg Mori. Arxiv Version
Web users often post questions: "Does hotel X have a pool?", "Is museum Y wheelchair accessible?". The potential to automate the answering process presents an exciting challenge for AI systems, with many practical applications. However, to the best of our knowledge, there are not yet any public datasets for general question answering on the web. In this paper, we introduce the PlacesQA dataset, which contains 9,750 questions and answers about 750 unique places, including hotels, museums and nightlife venues, derived from questions asked by real users of travel websites. This dataset serves as a testbed for general question answering. For concreteness, we also provide sets of 73,148 and 181,266 images from these 750 places, obtained via web searches. We show that images of these places on the web provide a rich source of information that can be potentially leveraged by an automatic question answering agent.
Figure 1. This paper takes a first step towards general question answering on the web (middle), in which an AI agent is given a user question and is tasked with acquiring relevant images (and other complementary modes of information) from the web to produce an accurate answer. Our PlacesQA dataset consists of "canonical" questions and answers covering 750 unique places, including hotels, museums, and nightlife venues. The visual QA example is from AntolICCV'15.
Figure 2. A demonstrative example of real world questions and answers, where Google image search results provide evidence for the answers.
You could download the Google Search Images here.
You could download the Facebook Images here.
Dataset Statistics
Category | Number of Places | No. of Categorical Questions |
---|---|---|
Hotels | 250 | 18 |
Museums | 250 | 15 |
Nightlife | 250 | 6 |
Table 1. Statistics of the PlacesQA dataset.
Figure 3. Left: We propose a new permutation-invariant fusion operator for sets that generalizes common pooling approaches, such as max, mean, and “max-min” pooling,and that can be learned end-to-end. Right: Late fusion model with generalized setpooling.
Method | Accuracy | Wins vs. Losses | Accuracy | Wins vs. Losses |
Hotels | Museums | |||
Majority | 72.1 | n/a | 70.2 | n/a |
Max Pooling | 72.2 | 3 vs. 2 | 69.1 | 0 vs. 5 |
Mean Pooling | 73.5 | 5 vs. 1 | 69.4 | 1 vs. 2 |
Generalized(Ours) | 74.9 | 7 vs. 1 | 69.5 | 1 vs. 3 |
Nightlife | Overall | |||
Majority | 64.0 | n/a | 70.1 | n/a |
Max Pooling | 63.7 | 0 vs. 1 | 69.7 | 3 vs. 8 |
Mean Pooling | 64.0 | 1 vs. 2 | 70.4 | 7 vs. 5 |
Generalized(Ours) | 66.0 | 3 vs. 0 | 71.4 | 11 vs. 4 |
Table 2. Summary of the results obtained using traditional set fusion methods and ourlearned generalized set fusion using Google search images. Wins (or losses) indicates the number of questions for which the method performs better (or worse) than answeringthe majority answer (yes/no) for a particular question.
Method | Accuracy | Wins vs. Losses | Accuracy | Wins vs. Losses |
Hotels | Museums | |||
Majority | 72.1 | n/a | 70.2 | n/a |
Max Pooling | 72.7 | 3 vs. 2 | 69.9 | 0 vs. 1 |
Mean Pooling | 72.5 | 4 vs. 2 | 69.8 | 0 vs. 2 |
Generalized(Ours) | 74.3 | 8 vs. 3 | 70.0 | 2 vs. 3 |
Nightlife | Overall | |||
Majority | 64.0 | n/a | 70.1 | n/a |
Max Pooling | 63.7 | 1 vs. 1 | 70.3 | 4 vs. 4 |
Mean Pooling | 64.0 | 1 vs. 1 | 70.1 | 5 vs. 5 |
Generalized(Ours) | 63.7 | 1 vs. 1 | 71.1 | 11 vs. 7 |
Table 3. Summary of the results obtained using traditional set fusion methods and ourlearned generalized set fusion using Facebook images. Wins (or losses) indicates the number of questions for which the method performs better (or worse) than answeringthe majority answer (yes/no) for a particular question.
Source code is released under the BSD 2-Clause license
If you are using our dataset, please cite: