Light-weight self attention block to replace Conv, FC, and Multihead self attention.
A content awared, sparsely connected layer based on self attention mechanism, which has fewer parameters and less computational cost than a fully connected layer with the same number of input&output channels.