jinglescode/papers

Bottleneck Transformers for Visual Recognition

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: https://arxiv.org/pdf/2101.11605.pdf
Year: 2021

Summary

incorporates self-attention in ResNet's bottleneck blocks, improves instance segmentation and object detection while reducing the parameters.
convolution and self-attention can beat ImageNet benchmark, pure attention ViT models struggle in small data regime, but shine in large data regime.

Methods

uses relative position encodings, seeing gains over the absolute position encodings

Results