Bottleneck Transformers for Visual Recognition
jinglescode opened this issue · 0 comments
jinglescode commented
Paper
Link: https://arxiv.org/pdf/2101.11605.pdf
Year: 2021
Summary
- incorporates self-attention in ResNet's bottleneck blocks, improves instance segmentation and object detection while reducing the parameters.
- convolution and self-attention can beat ImageNet benchmark, pure attention ViT models struggle in small data regime, but shine in large data regime.
Methods
uses relative position encodings, seeing gains over the absolute position encodings


