Insurance QA data formatted as Python objects and pickled.
Clone locally
git clone https://github.com/codekansas/insurance_qa_python.git
cd insurance_qa_python
pwd # where files are stored
Getting QA format with the files
import pickle
def get_pickle(filename):
return pickle.load(open(filename, 'rb'))
vocab = get_pickle('vocabulary')
def translate_sent(sent):
return [vocab[word] for word in sent]
dev = get_pickle('dev')
answers = get_pickle('answers')
def get_answer(answer_id):
return translate_sent(answers[answer_id])
for data_item in dev:
for bad_answer in data_item['bad']:
print('Question:', translate_sent(data_item['question']))
print('Good Answer:', get_answer(data_item['good'][0]))
print('Bad Answer: ', get_answer(bad_answer), '\n============')
About files:
vocabulary
:dict
object of(word index <int> -> word <str>)
relationshipsanswers
:dict
object of(answer index <int> -> word indices <list of ints>)
relationshipstrain
:list
ofdict
(onedict
per entry), where eachdict
has:question
: the word indices for the questionanswers
: the answer indices for each of the question's ground truth
dev / test1 / test2
:list
ofdict
(onedict
per entry), where eachdict
has:question
: the word indices for the questiongood
: the ground truthbad
: the other answers from the dataset
Applying Deep Learning to Answer Selection: A Study and An Open Task
Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou ASRU 2015