Transformer-Encoder

Transformer Encoder

Intro

The transformer architecture was first proposed in the paper Attention is all you need by google researcher. It uses attention to self attend the input to increase the accuracy. The biggest point of this architecture is that it designed to have an high degree of parallelization so it is indicated to be run in large training operations. The transformer encoder was used in important researches like Bert and OpenGPT1 and OpenGPT2