BERT Based QA System on kaggleQA Data set

This repository demonstrates BERT for downstream Question Answering Task on KaggleQA Dataset https://www.kaggle.com/rtatman/questionanswer-dataset

It gives step by step instructions to use Bert with default pretrained weights and hypere parameters, due to complexity of task and space, its tested on one example only

How to use these scripts?

The main script for this project is run_squad.py. To use this file to run prediction, make sure you have

one input json file with the following format:

{
  "data": [
    {
      "paragraphs": [
        {
          "context": "This is the text of document",
          "id": "Unique identifier to recognize document, etc: title of document",
          "qas": [
            {
              "question": "Question 1",
              "answers": "Answer 1 (This is optional, only useful when u want to train BERT QA model on this dataset)",
              "id": "Unique identifier to recognize this pair of question and answer."
            }
          ]
        }
      ]
    }
  ]
}

There is a ipynb notebook, pre_process_squad.ipynb which might be useful to help you convert you dataset to this specific format if your dataset is similar to the structure of this dataset:https://www.kaggle.com/rtatman/questionanswer-dataset

pretrained-weights are downloaded from BERT official github: https://github.com/google-research/bert#pre-trained-models, downloaded files should be put into folder pretrained-weights

After that, run the following command from Terminal:

python run_squad.py --bert_config_file='pretrained-weights/uncased_L-12_H-768_A-12/bert_config.json' --output_dir='output/' --predict_file='input/test.json' --init_checkpoint='pretrained-weights/uncased_L-12_H-768_A-12/bert_model.ckpt' --vocab_file='pretrained-weights/uncased_L-12_H-768_A-12/vocab.txt' --do_predict

to generate a output json file predictions.json inside output directory. (See usage_example/ if the above instruction is still not clear for you)this folder contains one example for training and testing

samreenkazi/BERT_KAGGLEQA

BERT Based QA System on kaggleQA Data set

How to use these scripts?