/KalBert

Korean ALBERT

Primary LanguagePython

KalBert

Korean ALBERT (A Lite BERT for Self-supervised Learning of Language Representations) language model

Training based on albert_zh (https://github.com/brightmart/albert_zh)

512 sequences, Large KalBert: https://drive.google.com/drive/folders/1a_yZIidugit3TxF__f8LSRPc8gfO2CV-?usp=sharing

  • Training data: ~6GB

  • Morph tokenizing without tag + BPE

    • (e.g. 이순신은 조선 중기의 무신이다. -> 이순신 은 조선 중기 의 무신 이 다 .)
  • Training steps: 191,000

  • KorQuAD v 1.0 Dev set

    • f1: 90.01, em: 81.26