This is a BERT based Transformer Model that is state of the art for learning contextual information from textual data and generate textual responses for prompts posed to it. We took a pre trained BERT-base case model and then fine tuned it on our custom dataset, SQuAD (Stanford Question Answering Dataset). We were able to achieve an accuracy above 96% with a CPU Wall time of around 70 Milliseconds.
To deploy this project run the training and main notebooks, by correctly mounting the drive with dataset.
-
The links for the NoteBooks can be found here:
- Initial Model
- Trained Model 1 with number of attention heads (n_heads) = 12
- Trained Model 2 with number of attention heads (n_heads) = 8
- Training Loss
- More detailed analysis/information of our implemented model and strategy can be looked at in the
report.pdf
andpresentation.pptx