Google-Quest-Answer

This repository contains codes for Google-Quest-Answer.

Structure for data

please arrange project folder as

codes
└── all codes in this repo
input
└── google-quest-challenge
      ├── train.csv
      ├── test.csv
      ├── train_augment_final_with_clean.csv (in translation_data folder)
      ├── sample_submission.csv
      └── split
           └── ...
model
└── bert
└── xlnet 
└── ...

Codes for Dataset

Please check codes for Dataset in "dataset" folder, you could run tests for (splitting train val sets, train_data_loader, val_data_loader, test_dataloader):

python3 dataset.py

Codes for Model

Please check codes for Model in "model" folder, you could run tests for models, and you can use "check_model.ipynb" to check model architecture:

python3 model_bert.py

Codes for Training

Please check codes for Training, you should change the path first then run:

./bert-uncased-k-fold.sh
./bert-cased-k-fold.sh
./xlnet-cased-k-fold.sh
./roberta-base-k-fold.sh
single model hidden_layers MIN_LR config.hidden_dropout_prob
bert-base-uncased, question_answer [-1, -3, -5, -7, -9] 2e-6 0.1
bert-base-uncased, question+answer [-1, -3, -5, -7, -9] 2e-6 0
bert-base-cased, question_answer [-1, -3, -5, -7, -9] 2e-6 0.1
bert-base-cased, question+answer [-2, -4, -6, -8, -10] 2e-6 0.1
xlnet-base-cased, question_answer [-3, -4, -5, -6, -7] 1.5e-6 0
xlnet-base-cased, question+answer [-3, -4, -5, -6, -7] 2e-6 0
roberta-base, question_answer [-3, -4, -5, -6, -7] 1.5e-6 0
roberta-base, question+answer [-3, -4, -5, -6, -7] 2e-6 0

Codes for SWA

Please check codes for simple SWA (not official codes), you should change the path first then run:

./swa-bert-base-uncased-k-fold.sh
./swa-bert-base-cased-k-fold.sh
./swa-xlnet-cased-k-fold.sh
./swa-roberta-base-k-fold.sh

Codes for Getting oof

Please check codes for oof, you should change the path first then run:

./oof-bert-uncased-k-fold.sh
./oof-bert-cased-k-fold.sh
./oof-xlnet-cased-k-fold.sh
./oof-roberta-base-k-fold.sh

model performace (oof)

single model oof
bert-base-uncased, question_answer 0.403928
bert-base-uncased, question+answer 0.404822
bert-base-cased, question_answer 0.403596
bert-base-cased, question+answer 0.405100
xlnet-base-cased, question_answer 0.398455
xlnet-base-cased, question+answer 0.410154
roberta-base, question_answer 0.395185
roberta-base, question+answer 0.412353

The oof files are in https://www.kaggle.com/jionie/qaallmodellogs

Codes for inference

Please use "models-with-optimization-v5.ipynb" in "inference" folder, this is also available on https://www.kaggle.com/jionie/models-with-optimization-v5

Codes for postprocessing

You can test postprocessing with all oof files and "test_postprocessing.py" in "postprocessing_optimization" folder.

License

MIT