NLP system trained on Stanford Question Answering Dataset (SQuAD). SQuAD tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
Below you can find the instructions for downloading the dataset, the pre-trained embeddings and the weights of our trained model.
Please note that the software will automatically download all of them, so you are not required to specifically follow the below procedure.
First of all you have to create a new folder /data
at the root level of the project.
The original dataset is available at the following link.
Once you have downloaded it, you have to place the unzipped file training_set.json
inside the /data/raw/
folder.
The pre-trained Glove embeddings used are available at the following link.
Once you have downloaded it, you have to place the unzipped file Glove_50.txt
inside the /data/raw/
folder.
The weights of the trained model are available at the following link.
Once you have downloaded it, you have to place the unzipped file DRQA.h5
inside the /data/checkpoints/
folder.
If you are using the new Apple M1 chip please be sure to have installed hdf5
by running:
$ brew install hdf5
below you can find all the scripts for installing based on your OS/processor
$ make
> "+------------------------------------------------------+"
> "| OS | Hardware | Setup Command |"
> "+------------------------------------------------------+"
> "| Windows/Linux | - GPU | 'make setup.CPU' |"
> "| Windows/Linux | + GPU | 'make setup.GPU' |"
> "| Apple macOS | + M1 | 'make setup.M1' |"
> "| Apple macOS | - M1 | 'make setup.CPU' |"
> "+------------------------------------------------------+"
for instance if you have MacOS with Intel chip you have to run:
$ make setup.CPU
or alternatively you can find all the different version of the requirements
inside the /tools/requirements
folder.
You can train the model from scratch using your custom dataset by running:
$ python src/train.py "<path_of_your_json_dataset>"
You can run the inference procedure on a specific dataset by running:
$ python src/compute_answers.py "<path_of_your_json_dataset>"
once you have done this, you can see the output generated inside the /data/predictions/answers.pred.json
file.
You can evaluate the performances of the inference by running:
$ python src/evaluate.py "<path_of_your_json_dataset>" data/predictions/answers.pred.json
- Matteo Conti - author - contimatteo
- Francesco Palmisano - author - Frankgamer97
- Primiano Arminio Cristino - author - primianocristino
- Luciano Massaccesi - author - fruscello
This project is licensed under the MIT License - see the LICENSE.md file for details