Google-Quest-Answer

This repository contains codes for Google-Quest-Answer.

Structure for data

please arrange project folder as

codes
└── all codes in this repo

input
└── google-quest-challenge
      ├── train.csv
      ├── test.csv
      ├── train_augment_final_with_clean.csv (in translation_data folder)
      ├── sample_submission.csv
      └── split
           └── ...

model
└── bert
└── xlnet 
└── ...

Codes for Dataset

Please check codes for Dataset in "dataset" folder, you could run tests for (splitting train val sets, train_data_loader, val_data_loader, test_dataloader):

python3 dataset.py

Codes for Model

Please check codes for Model in "model" folder, you could run tests for models, and you can use "check_model.ipynb" to check model architecture:

python3 model_bert.py

Codes for Training

Please check codes for Training, you should change the path first then run:

./bert-uncased-k-fold.sh

./bert-cased-k-fold.sh

./xlnet-cased-k-fold.sh

./roberta-base-k-fold.sh

single model	hidden_layers	MIN_LR	config.hidden_dropout_prob
bert-base-uncased, question_answer	[-1, -3, -5, -7, -9]	2e-6	0.1
bert-base-uncased, question+answer	[-1, -3, -5, -7, -9]	2e-6	0
bert-base-cased, question_answer	[-1, -3, -5, -7, -9]	2e-6	0.1
bert-base-cased, question+answer	[-2, -4, -6, -8, -10]	2e-6	0.1
xlnet-base-cased, question_answer	[-3, -4, -5, -6, -7]	1.5e-6	0
xlnet-base-cased, question+answer	[-3, -4, -5, -6, -7]	2e-6	0
roberta-base, question_answer	[-3, -4, -5, -6, -7]	1.5e-6	0
roberta-base, question+answer	[-3, -4, -5, -6, -7]	2e-6	0

Codes for SWA

Please check codes for simple SWA (not official codes), you should change the path first then run:

./swa-bert-base-uncased-k-fold.sh

./swa-bert-base-cased-k-fold.sh

./swa-xlnet-cased-k-fold.sh

./swa-roberta-base-k-fold.sh

Codes for Getting oof

Please check codes for oof, you should change the path first then run:

./oof-bert-uncased-k-fold.sh

./oof-bert-cased-k-fold.sh

./oof-xlnet-cased-k-fold.sh

./oof-roberta-base-k-fold.sh

model performace (oof)

single model	oof
bert-base-uncased, question_answer	0.403928
bert-base-uncased, question+answer	0.404822
bert-base-cased, question_answer	0.403596
bert-base-cased, question+answer	0.405100
xlnet-base-cased, question_answer	0.398455
xlnet-base-cased, question+answer	0.410154
roberta-base, question_answer	0.395185
roberta-base, question+answer	0.412353

The oof files are in https://www.kaggle.com/jionie/qaallmodellogs

Codes for inference

Please use "models-with-optimization-v5.ipynb" in "inference" folder, this is also available on https://www.kaggle.com/jionie/models-with-optimization-v5

Codes for postprocessing

You can test postprocessing with all oof files and "test_postprocessing.py" in "postprocessing_optimization" folder.

License

MIT

jionie/Google-Quest-Answer