An NLP pipeline for generating distractors in question-answering systems using context-based entity replacement and multiple choice question evaluation.
The goal of our project was to develop a natural language processing pipeline that could generate distractors for question-answering systems. To accomplish this, we fine-tuned the distilbert-uncased model on the SQUAD dataset to enable it to generate answers. We then used a pre-trained NER model from the Spacy library to identify named entities in the generated answers.
To generate the distractors, we replaced the named entities with same-class entities from the context. For example, if the named entity in the generated answer was a person, we would look for other person entities in the context to use as distractors. To evaluate the effectiveness of our approach, we used a pre-trained model from HuggingFace named "T5-base fine-tuned on QASC" which is specifically designed for multiple choice question answering.
Our pipeline allows for the generation of distractors that are contextually relevant, which can help improve the overall performance of question-answering systems. Additionally, our approach allows for the creation of distractors that are different from the correct answer, but still plausible, which can be useful for training and evaluating question-answering systems.
Before running our NLP pipeline, it is necessary to have Jupyter Notebook installed on your machine, along with Python 3 and the following libraries:
transformers
datasets
evaluate
sentencepiece
spacy
To use our NLP pipeline, please follow the steps below:
-
Clone the repository to your local machine.
-
Install Python 3 (if not already installed) from the official website: https://www.python.org/downloads/
-
Install the required libraries by running the following command in your terminal or command prompt:
pip install transformers datasets evaluate sentencepiece spacy
-
Launch Jupyter Notebook and open the three Jupyter notebooks included in the repository:
- QuestionAnswering.ipynb
- DistractorGenerator.ipynb
- DistractorEvaluation.ipynb
-
Follow the instructions in the notebooks to run the pipeline.
Please note that the pipeline has been tested on Google Colab, and some of the file paths in the notebooks may point to Google Drive directories. You may need to modify these paths accordingly to ensure that the notebooks work correctly.
Metin Usta - metin.usta01@hotmail.com
Muhannad Tuameh - muhannadtumah@gmail.com
Project Link: https://github.com/MetinUsta/Multiple-Choice-Distractor-Generator