/BERT-TF2

BERT reimplentation on Tensorflow v2.3.0

Primary LanguagePythonMIT LicenseMIT

BERT-TF2

BERT reimplentation on Tensorflow v2.3.0

Work in progess..

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper | Official code | TF/models code

Plans

  • BERT model on TF 2.x Keras API (done)
  • Loading official BERT weights
  • Data parallelism for GPU, TPU
  • Should all the data be prepared as TFRecord before training? Will preparing them on training be a unbearable bottleneck?
  • Any TPU-ineffecient operation? (e.g. reshaping tensors) link
  • Training ELECTRA based on this model
  • Practice model parallelism (Tensorflow mesh? link) (or maybe later.. tf.distribute doc link)

tf.distribute doc

Data parallelism is where we run multiple copies of the model on different slices of the input data. This is in contrast to model parallelism where we divide up a single copy of a model across multiple devices. Note: we only support data parallelism for now, but hope to add support for model parallelism in the future.

Contact

Han Seungho (danhahn61@gmail.com)