GreenCUBIC/lookhere

LookHere position encoding for ViTs

MIT

LookHere

LookHere position encoding for ViTs. Code and pretrained models coming very soon.

ImageNet-HR dataset: https://huggingface.co/datasets/antofuller/ImageNet-HR

Please Cite

@misc{fuller2024lookhere,
      title={LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate}, 
      author={Anthony Fuller and Daniel G. Kyrollos and Yousef Yassin and James R. Green},
      year={2024},
      eprint={2405.13985},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}