Please run pip install -r requirements.txt
(python3
required). For fine-tuning on the TechQA Dataset, use this.
- Our proposed RoBERTa-based variants
- Ablation studies - changing the document encoder of RoBERTa-based variants to 'Paragraph Encoder + 2-layer transformer'
- Our proposed BERT-based variants
- Baselines
-
To download the training set, run
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
. -
Run
python3 finetune_squad.py <MODEL_TYPE> <MODEL_PATH>
<MODEL_TYPE>
can bebert
orroberta
<MODEL_PATH>
is the model path/HuggingFace model name.
To get the models fine-tuned on the SQuAD 2.0 models, just add
_squad2.0
at the end of a pre-trained model's link (For example, the link to the 'triplet + hier.' RoBERTa-based model obtained after pre-training and fine-tuned on SQuAD 2.0 is https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1_squad2.0)