Provides Attention, SelfAttention, MultiHeadAttention, MultiHeadSelfAttention, EncoderLayer, Encoder, DecoderLayer, Decoder, SinusoidalPositionalEncoder, LearnedPositionalEncoder
structures and corresponding functionalities to be build transformer models. See documentation and examples of each in attention.jl and encoder-decoder.jl