jinglescode/papers

Bottleneck Transformers for Visual Recognition

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/2101.11605.pdf
Year: 2021

Summary

  • incorporates self-attention in ResNet's bottleneck blocks, improves instance segmentation and object detection while reducing the parameters.
  • convolution and self-attention can beat ImageNet benchmark, pure attention ViT models struggle in small data regime, but shine in large data regime.

image

Methods

image

uses relative position encodings, seeing gains over the absolute position encodings

Results

image