Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.
TransformerEncoderBlock implements an optionally masked transformer as described in "Attention Is All You Need".
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
MaskedLM implements a masked language model. It assumes the embedding table variable is passed to it.
Encoders are combinations of layers (and possibly other encoders). They are sub-units of models that would not be trained alone. It encapsulates common network structures like a classification head or a transformer encoder into an easily handled object with a standardized configuration.
- BertEncoder implements a bi-directional Transformer-based encoder as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". It includes the embedding lookups, transformer layers and pooling layer.