huggingface/pytorch-pretrained-BERT repository contains op-for-op PyTorch reimplementations, pre-trained models and fine-tuning examples for Google's BERT model. And as a result of submission using the run_squad.py code provided by huggingface/pytorch-pretrained-BERT repository, it ranked 30th in the test set with EM= 71.47, F1= 89.71, respectively, as shown below. (2019.06.17)
A trained BERT model is publicly available. So, I'm going to cover the process of submitting a model and result for official evaluation on KorQuAD. Once your model has been evaluated officially, your scores will be added to the leaderboard. Thus I would assume you already completed model training in KorQuAD, and have a trained model archive.
A pre-trained language model, BERT, is publicly available. For KorQuAD submission, what you have to do is to fine-tune the pre-trained BERT model on KorQuAD. And fine-tuning can be done simply by running run_squad.py
(in here) on the KorQuAD dataset.
And, even if you fine-tune BERT with the default hyper-parameters in run_squad.py
, we can get the following results (It's a score that can be ranked in the 30th grade, based on June 2019):
$ python evaluate-v1.0.py KorQuAD_v1.0_dev.json predictions.json
{"exact_match": 70.29788708001385, "f1": 90.08062112089534}
Below table shows the results of the BERT models fine-tuned with various hyper-parameters.
- Model : model name with detailed description
- #steps : number of optimization steps in training
- EM (Exact Match) : ratio of accurate prediction to actualanswer text
- F1 : score on how the actual answer text overlaps with the prediction
Model | #stpes | EM(dev) | F1(dev) |
---|---|---|---|
change maximum sequence length | |||
BERT-Multilingual - baseline (description) | 7,587 | 70.298 | 90.081 |
BERT-Multilingual - baseline (+ max_seq_length=512) (description) | 6,336 | 70.80 | 90.104 |
change train batch size | |||
BERT-Multilingual - baseline (+ train_batch_size=16) (description) | 15,174 | 70.159 | 89.818 |
BERT-Multilingual - baseline (+ max_seq_length=512, train_batch_size=16) (description) | 12,669 | 69.830 | 89.407 |
change learning-rate | |||
BERT-Multilingual - baseline (+ learning_rate=3e-5) (description) | 7,587 | 70.419 | 90.241 |
BERT-Multilingual - baseline (+ learning_rate=3e-5, train_batch_size=16) (description) | 15,174 | 70.229 | 90.114 |
BERT-Multilingual - baseline (+ max_seq_length=512, learning_rate=3e-5) | 6,336 | 70.644 | 90.246 |
BERT-Multilingual - baseline (+ max_seq_length=512, learning_rate=3e-5, train_batch_size=16) | 12,669 | 70.419 | 90.179 |
Tips for hyper-parameter tuning
pre-trained model
:bert-large-cased
, pre-trained model, is not recommendednum_train_epochs
: set to default valuelearning rate
: the smaller, the better
To get official scores on the KorQuAD test set, we must submit(or upload) our model to the CodaLab. This is because the integrity of test results should be preserved.
Here's a detailed process that guides you through the official evaluation of our model.
You can create a CodaLab account here.
Click sign up
in the top-right corner of the CodaLab homepage.
Click the My dashboard
in the top-right corner.
Click the New Worksheet
in the upper-right corner and name your worksheet.
Begin by uploading archive for the trained model onto Codalab.
For example, I trained (BERT) model using run_squad.py
code from huggingface/pytorch-pretrained-BERT repository, and the archive of trained model consists of the following:
- run_squad.py : python script to generate the predictions
- vocab.txt : vocabulary file
- config.json : a configuration file for the model
- pytorch_model.bin : a PyTorch dump of a trained BERT model (saved with the usual torch.save())
CodaLab requires us that the prediction python script should run with the following arguments. Therefore, if your python script does not run with the following arguments, the script needs to be modified.
CodaLab> python <path-to-prediction-python-script> <input-data-json-file> <output-prediction-json-file>
If you use run_squad.py
code from huggingface/pytorch-pretrained-BERT repository for training, and have difficulty modifying this to the above format, just use run_squad_for_submission.py
in this repository. It runs with the arguments required by CodaLab, and generates predictions. And it should be in the same place as the archive of a trained model (vocab.txt, config.json, pytorch_model.bin).
Copy the dev data to the worksheet by using the following command into the CodaLab terminal. Do not upload the dev data directly!
CodaLab> cl add bundle korquad-data//KorQuAD_v1.0_dev.json .
Then run the command for the python script generates the predictions on the dev data. Be sure to replace <path-to-prediction-python-script>
with your actual program path.
CodaLab> cl run :KorQuAD_v1.0_dev.json :<path-to-prediction-python-script> "python <path-to-prediction-python-script> KorQuAD_v1.0_dev.json predictions.json" -n run-predictions
If you run the following command on the default Docker image(codalab/ubuntu:1.9), the GPU is not available because the image does not support CUDA. This can be checked as follows.
CodaLab> cl run :check_cuda.py "python check_cuda.py"
# stdout>
# ('Python Version: ', '2.7.12')
# ('PyTorch Version: ', '0.4.1')
# ('CUDA Available: ', False)
# ('Device Count: ', 0)
To enable GPU acceleration, we need to include the --request-gpus
flag like belows.
CodaLab> cl run :check_cuda.py "python check_cuda.py" --request-gpus 1
# stdout>
# ('Python Version: ', '2.7.12')
# ('PyTorch Version: ', '1.0.1.post2')
# ('CUDA Available: ', True)
# ('Device Count: ', 1L)
# ('Current Device: ', 0L)
# ('Device Name', u'Tesla M60')
cf. You can also increase the GPU memory by using --request-memory
flag, because default GPU memory size (2GB) is too small.
CodaLab> cl run :check_cuda.py "python check_cuda.py" --request-gpus 1 --request-memory 11g
Now, all jobs (e.g., prediction) can be conducted on a GPU.
CodaLab uses Docker containers to define the environment of a run bundle. Each Docker container is based on a Docker image, which specifies the full environment, including which Linux kernel version, which libraries, etc. The default Docekr image is codalab/ubuntu:1.9
, which consists of Ubuntu 14.04 plus some standard packages (e.g., Python).
In CodaLab, when you create a run, you can specify which Docker container you want to use like below.
CodaLab> cl run <command> --request-docker-image <docker-image-to-use>
Here we use lyeoni/pytorch_pretrained_bert docker image that contains PyTorch 1.0.0 (with CUDA 9.0) and pre-trained BERT models re-implemented by PyTorch.
lyeoni/pytorch_pretrained_bert image is based on 2 repository, anibali's PyTorch docker image that contains various CUDA-enabled version of the PyTorch image, and PyTorch Pretrained BERT that contains PyTorch reimplementations, pre-trained models for Google's BERT model.
lyeoni/pytorch_pretrained_bert docker image makes both CUDA-enabled PyTorch and pre-trained BERT model easy to use in CodaLab.
CodaLab> cl run :check_cuda.py "python check_cuda.py" --request-docker-image lyeoni/pytorch_pretrained_bert --request-gpus 1
# Python Version: 3.6.5
# PyTorch Version: 1.0.0
# CUDA Available: True
# Device Count: 1
# Current Device: 0
# Device Name Tesla K80
The final command to generate the predictions on the dev set is like belows.
CodaLab> cl run :config.json :vocab.txt :pytorch_model.bin :KorQuAD_v1.0_dev.json :run_squad.py "python run_squad.py KorQuAD_v1.0_dev.json predictions.json" -n run-predictions --request-docker-image lyeoni/pytorch_pretrained_bert --request-gpus 1 --request-memory 11g
If the bundle state is ready not failed, we extract out the predictions file into a bundle of its own. Let's do this as follows:
# MODELNAME should not contain spaces (avoid using special characters too).
CodaLab> cl make run-predictions/predictions.json -n predictions-{MODELNAME}
Now, let's verify that we can evaluate the predictions on the dev set.
CodaLab> cl macro korquad-utils/dev-evaluate-v1.0 predictions-{MODELNAME}
Once this succeeds, you should see the scores for your model appended to the worksheet.
Follow the submission guide provided in the official site. This step requires only a formatted descriptions(e.g., model name, your name, institution, etc.) to add an official leaderboard.
- In CodaLab, each run has a state, which evolves through the following values:
created
: initial statestaged
: for run bundles, meaning dependencies are readypreparing
: launch a worker just for this run, waiting for itrunning
: a worker is running the commandready/failed
: terminal states corresponding to a successful or unsuccessful run
- [LG CNS AI Research Team] KorQuAD, The Korean Question Answering Dataset
- [LG CNS AI Research Team] KorQuAD Submission Guide (English Ver.)
- [LG CNS AI Research Team] KorQuAD Submission Guide (Korean Ver.)
- [CodaLab] CodaLab, A collaborative platform for reproducible research
- [codalab/codalab-worksheets] Execution
- [codalab/codalab-worksheets] CLI Reference
- [anibali/docker-pytorch] PyTorch Docker image
- [huggingface/pytorch-pretrained-BERT] PyTorch Pretrained BERT: The Big & Extending Repository of pretrained Transformers
- [huggingface/pytorch-pretrained-BERT] Fixing issue "Training beyond specified 't_total' steps with schedule 'warmup_linear'" reported in #556 #604
- [lyeoni/pytorch_pretrained_bert] Docker image with pyotrch and pretrained_bert
- [pbaumgartner/pytorch-bert] pytorch-bert-image