Quora-Question-Pairs Project
This is the final project of nis8021. Sentence similarity prediction on Quora Question Pairs Dataset. The dataset is splitted into train/valid/test set in ./data
Result
Model | Test Accuracy | Test F1 Score |
---|---|---|
RoBERTa_base + CE loss | 0.910633 | 0.904185 |
RoBERTa_pretrained + CE loss | 0.913156 | 0.907138 |
RoBERTa_pretrained + Focal loss | 0.913453 | 0.907755 |
Stacking | 0.917955 | 0.912335 |
Focal loss
Focal loss uses two parameters
python cal_focal_params.py
Run
- Pretrain on QQP
bash run_pretrain.sh
- Finetune with RoBERTa_base
bash run_finetune_roberta-base.sh
- Finetune with RoBERTa_pre
bash run_finetune_ce.sh
- Finetune with RoBERTa_pre using focal loss
bash run_finetune_focal.sh
- Inference to construct dataset for stacking
bash inference.sh
- Train a Stacking model
python stacking.py