Amazon question/answer dataset


This dataset contains Question and Answer data from Amazon, totaling around 1.4 million answered questions. The data were collected and made available by Prof. Julian McAuley of UCSD (University of San Diego)


This data was acquired from Prof. Julian's website. Different categories of data were present in independent archive files (each question being a separate JSON file in the archive). I joined them and added the attribute Category for ease of use. Please go through the ICDM paper, in particular, it goes into great detail on the potential of this dataset. That was one of my major inspirations for creating this dataset.


Modeling ambiguity, subjectivity, and diverging viewpoints in opinion question answering systems
Mengting Wan, Julian McAuley
International Conference on Data Mining (ICDM), 2016

Addressing complex and subjective product-related queries with customer reviews
Julian McAuley, Alex Yang
World Wide Web (WWW), 2016