MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations.
-
The
token-mixing
is like involution in terms of channel-agnostic weights, but involution is more flexible with spatial-specific weights. This difference makes involution more friendly to transfer to downstream tasks, such as detection and segmentation. -
The
channel-mixing
is like 1x1 convolution, permiting channel information exchange.
The combination of the above two is similar to replacing 3x3 convolution in the ResNet bottleneck block with involution, while maintaining the 1x1 convolution, giving rise to our convolution-free, attention-free architecture RedNet.
Anyway, the take-home message is common: fully-MLP based architecture could rival convolution or self-attention based architectures.
The implementation is based on the JAX/Flax code in the Appendix of the original paper.