kevhahn97/BERT-TF2

BERT reimplentation on Tensorflow v2.3.0

PythonMIT

BERT-TF2

BERT reimplentation on Tensorflow v2.3.0

Work in progess..

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper | Official code | TF/models code

Plans

BERT model on TF 2.x Keras API (done)
Loading official BERT weights
Data parallelism for GPU, TPU
Should all the data be prepared as TFRecord before training? Will preparing them on training be a unbearable bottleneck?
Any TPU-ineffecient operation? (e.g. reshaping tensors) link
Training ELECTRA based on this model
Practice model parallelism (Tensorflow mesh? link) (or maybe later.. tf.distribute doc link)

tf.distribute doc

Data parallelism is where we run multiple copies of the model on different slices of the input data. This is in contrast to model parallelism where we divide up a single copy of a model across multiple devices. Note: we only support data parallelism for now, but hope to add support for model parallelism in the future.

Contact

Han Seungho (danhahn61@gmail.com)