This is our code submission for qrcd please contact me if you encounter any issue
mohammed.a.elkomy@gmail.com
mohammed.a.elkomy@f-eng.tanta.edu.eg
https://arxiv.org/abs/2206.01550
.
├── answer_voting_ensemble.py # the script used for ensemble
├── data # this folder holds the dataset and scripts for reading/ evaluating it
├── post_processing
│ ├── __init__.py
│ ├── print_results_table.py # view checkpoints tables
│ └── results
│ ├── eval # the checkpoints and run files for development phase
│ └── test # the checkpoints and run files for test phase
├── readme.md
├── run_qa.py # to train a model
├── tmp # a placeholder folder for temp files created
│ └── tmp.md
├── trainer_qa.py # a helper script
└── utils_qa.py # a helper script
- I train different models on colab and download a file called .dump file which makes it easier to work with it locally for testing and debugging
- those dump files represent the results of a trained checkpoint, any file ending with .dump is an example of them
- to load any file of them just use the library joblib and do
joblib.load(filename)
- after collecting those dump files for all of our trained models we feed them to the ensemble script, we employ a self-ensemble approach where we combine checkpoints of the same model initialized with different seeds.
QRCD_demo.ipynb
is demo notebook for reproducing all of the reported checkpoints, you just need to download the dump file generated from colab into your local machine.
- our ensemble takes a list of dump files to combine them based on their softmax scores
- then it does post-processing
- the saved checkpoints will be saved to
post_processing/results
- to reproduce a model when training you need to feed this parameter
--seed 14
for a seed of 14 - if you are trying to reproduce it manually please verify the dump files you download from colab are the same as the ones shared on the drive link above
reproducing table 2 in the paper, total_num_of_models =15LARGE + 15BASE+ 15ARBERT = 45
we have 15 checkpoints, seeds are:
8045, 32558, 79727 ,30429 ,48910 ,46840 ,24384 ,55067 ,13718 ,16213 ,63304 ,40732 ,38609 ,22228 ,71549
we have 15 checkpoints, seeds are:
71338 ,67981 ,29808 ,67961 ,25668 ,20181 ,20178 ,67985 ,67982 ,23415 ,20172 ,20166 ,25982 ,27073 ,26612
we have 15 checkpoints, seeds are:
64976 ,64988 ,73862 ,84804 ,79583 ,81181 ,59377 ,59382 ,73869 ,77564 ,79723 ,64952 ,73865 ,59373 ,84349
reproducing table 3 in the paper, total_num_of_models =16LARGE + 18BASE+ 17ARBERT = 51
we have 16 checkpoints, seeds are:
1114 ,18695 ,23293 ,27892 ,5748 ,59131 ,63847 ,68498 ,73133 ,77793 ,82431 ,87062 ,91701 ,94452 ,96475 ,98797
we have 18 checkpoints, seeds are:
54235 ,60998 ,64662 ,80936 ,80955 ,80959 ,80970 ,80988 ,82916 ,84448 ,84481 ,84665 ,84749 ,84871 ,87891 ,87917 ,88329 ,88469
we have 17 checkpoints, seeds are:
107 ,14 ,43919 ,47360 ,50798 ,57621 ,86829 ,88813 ,90781 ,91496 ,91533 ,94949 ,95000 ,96521 ,96552 ,98412 ,98465
- to train any of the checkpoints above just check
QRCD_demo.ipynb
notebook, it runsrun_qa.py
script - download dump files created while training and write them to
post_processing/results/eval
orpost_processing/results/test
- to run the ensemble just run
python answer_voting_ensemble.py
- this will write the files for both the eval and test phase ensemble
- json submissions files will be saved to
post_processing/results/eval
andpost_processing/results/test
and - if you would like to evaluate them you may use the official
quranqa22_eval.py
script
- to print the tables reproducing table 2 in the paper just run
python print_results_table.py
- make sure to be in
post_processing
directory
- to evaluate json files, you may run the script
evaluate_official.py
Model\Metric | pRR | EM | F1 |
---|---|---|---|
Original | 0.639 | 0.39 | 0.594 |
Uninformative answers kept | 0.652 | 0.394 | 0.594 |
Uninformative answers removed | 0.652 | 0.385 | 0.593 |
Model\Metric | pRR | EM | F1 |
---|---|---|---|
Original | 0.542 | 0.264 | 0.480 |
Uninformative answers kept | 0.557 | 0.268 | 0.485 |
Uninformative answers removed | 0.565 | 0.273 | 0.494 |
Uninformative answers removed achieved 0.565 pRR score securing the first place 🥇 among accepted papers 🤓.