Project Objective: The objective of this project is to develop a question-answering (QA) system using BERT (Bidirectional Encoder Representations from Transformers) for the Stanford Question Answering Dataset (SQuAD) 2.0. The model is fine-tuned on the SQuAD 2.0 dataset, and its performance is evaluated on both the training and validation sets. The project involves pre-processing the SQuAD 2.0 dataset, training a BERT-based QA model, and evaluating its accuracy.
Key Components:
- Data Loading and Pre-processing
- BERT Model Fine-tuning
- Training and Evaluation
- Custom Query Prediction
- Model Evaluation Metrics
Conclusion:
This project showcases the process of fine-tuning a BERT model for question answering using the SQuAD 2.0 dataset. The model demonstrates its ability to provide accurate answers to user queries based on the provided context. The project contributes to the development of natural language processing applications and showcases the power of pre-trained transformer models for QA tasks.
The dataset that is used the most as an academic benchmark for extractive question answering is SQuAD (The Stanford Question Answering Dataset). https://rajpurkar.github.io/SQuAD-explorer/
Dataset Summary:
Combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.