This repository is NOT actively maintained. However, issues and security alerts will be monitored and potentially fixed. No, this is not directly compatible with HuggingFace transformers(and models based on it, incl. Kakao GPT-3). I do NOT provide active support requests with TF-torch model translations.
- THE SCRIPT THAT SUPPORTS TPUS PROPERLY(<10% TPU idle)
- Fast tokenizer powered by HuggingFace/tokenizers
- Live demo (currently unavailable) #
- 1.5B GPT2 pretrained Korean model ( ~40G corpus )
GPT-2 Small to GPT-2 XL is tested. Not guaranteed to work for larger models.
cd KoGPT2-train
export PYTHONPATH=.
python3 train/train_tpu.py --input_file gs://kogpt2/datasets/WEB* --output_dir gs://kogpt2/models/large --max_seq_length 2048 --save_checkpoints_steps 5000 --use_tpu true --tpu_name v3-2 --train_batch_size 16 --config_file configs/large.json --iterations_per_loop 1000 --learning_rate 1e-4
The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks. Currently, the underlying model is same as GPT-2. I'm working on the alternating layers.
If you want GPT-2, just change the context token length from 2048 to 1024 and it's practically the same. Refer to the original paper for specific hyperparameter settings.
This research wouldn't have been possible without the TFRC program and NIPA's HPC Support Program.
@misc{KoGPT3,
author = {Seungjae Kim},
title = {KoGPT3 : Pretrained for Korean},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ksjae/KoGPT}},
}
Code based on https://github.com/imcaspar/gpt2-ml
https://github.com/google-research/bert
https://github.com/rowanz/grover
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)