MY GIT - https://github.com/Mejorarsim
This project focuses on building and evaluating multiple-choice question-answering models using the WikiQA corpus. Various methods were explored, including traditional set similarity measures, cosine similarity of term frequency (TF) vectors, and deep learning approaches using the BERT model.
The WikiQA corpus, used in this project, consists of train, validation, and test splits:
- Train Split: WikiQA-Train
- Validation Split: WikiQA-Validation
- Test Split: WikiQA-Test
The dataset contains questions and multiple answer options, with one correct answer per question.
Data was loaded and pre-processed using SpaCy for tokenization and lemmatization. Each dataset split was analyzed for the number of questions and options, and tokenization statistics were computed, including:
- Average number of tokens per question.
- Average number of tokens per choice.
- Average number of tokens per correct choice.
Additional exploration included word frequency analysis and examining the overlap and semantic similarity between questions and their correct options.
Set similarity measures were applied to determine the best matching answer:
- Overlap Coefficient
- Sorensen-Dice Coefficient
- Jaccard Similarity
Each measure was evaluated on the training and validation sets by calculating the accuracy and handling ties when similarity scores were identical.
Term Frequency (TF) vectors were generated using the CountVectorizer with a custom tokenizer. Cosine similarity between the TF vectors of the questions and answers was used to select the most similar answer. Accuracy was reported for the training and validation sets.
The BERT model (bert-base-uncased
) was employed to generate context vectors for questions and answers. The context vector corresponding to the [CLS] token was used. Cosine similarity between the BERT vectors of questions and answers was calculated to select the most similar answer. Accuracy was evaluated on the training and validation sets.
A BERT-based sequence classification model was fine-tuned to classify question-option pairs:
- Training Process: The model was trained using a dataset of question-option pairs, where each pair was labeled as correct or incorrect.
- Evaluation Metrics: Accuracy, precision, recall, and F1 score were reported for the validation set.
- Selecting the Correct Answer: The option with the highest positive logit was selected as the correct answer.
Method | Training Accuracy | Validation Accuracy |
---|---|---|
Overlap Coefficient | 65.3% | 63.2% |
Sorensen-Dice Coefficient | 67.5% | 65.0% |
Jaccard Similarity | 66.2% | 64.1% |
Cosine Similarity (TF Vectors) | 70.4% | 68.3% |
Cosine Similarity (BERT Vectors) | 78.5% | 75.6% |
Fine-Tuned BERT (Question-Option) | 82.4% | 79.8% |
The fine-tuning of the BERT model on question-option pairs achieved the highest accuracy, surpassing traditional set similarity measures and cosine similarity of TF vectors. The BERT model's deep contextual understanding provides a significant advantage over simpler methods.
-
Clone the Repository:
git clone https://github.com/yourusername/your-repo.git cd your-repo
-
Install Requirements:
pip install -r requirements.txt
-
Download Dataset: Download the dataset and place it in the appropriate directory.
-
Run the Notebook: Open and run the Jupyter notebook
main_notebook.ipynb
to reproduce the results.
- Python 3.7+
spacy
pandas
numpy
scikit-learn
transformers
torch
Install the necessary Python packages using pip install -r requirements.txt
.
Special thanks to the creators of the WikiQA dataset and the developers of the SpaCy and Hugging Face Transformers libraries.