Code for the ACL Rolling Review submission - ''

Required dependencies

Please run pip install -r requirements.txt (python3 required). For fine-tuning on the TechQA Dataset, use this.

Links to models pre-trained on the EManuals Corpus

  • Our proposed RoBERTa-based variants
  1. hierarchical network
  2. triplet network
  3. triplet + hier. network
  • Ablation studies - changing the document encoder of RoBERTa-based variants to 'Paragraph Encoder + 2-layer transformer'
  1. hierarchical network
  2. triplet network
  3. triplet + hier. network
  • Our proposed BERT-based variants
  1. hierarchical network
  2. triplet network
  3. triplet + hier. network
  • Baselines
  1. bert-base-uncased
  2. roberta-base
  3. EManuals_BERT
  4. EManuals_RoBERTa
  5. DeCLUTR
  6. CLINE
  7. ConSERT
  8. SPECTER

Fine-tuning on SQuAD 2.0

  • To download the training set, run wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json.

  • Run python3 finetune_squad.py <MODEL_TYPE> <MODEL_PATH>

    • <MODEL_TYPE> can be bert or roberta
    • <MODEL_PATH> is the model path/HuggingFace model name.

To get the models fine-tuned on the SQuAD 2.0 models, just add _squad2.0 at the end of a pre-trained model's link (For example, the link to the 'triplet + hier.' RoBERTa-based model obtained after pre-training and fine-tuned on SQuAD 2.0 is https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1_squad2.0)

Fine-tuning on TechQA Dataset

Fine-tuning on S10 QA Dataset