/keras-vision-transformer

The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET

Primary LanguagePythonMIT LicenseMIT

keras-vision-transformer

This repository contains the tensorflow.keras implementation of the Swin Transformer (Liu et al., 2021) and its applications to benchmark datasets.

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. https://arxiv.org/abs/2103.14030.

  • Hu, C., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q. and Wang, M., 2021. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv preprint arXiv:2105.05537.

Notebooks

Note: the Swin-UNET implementation is experimental

  • MNIST image classification with Swin Transformers [link]
  • Oxford IIIT Pet image Segmentation with Swin-UNET [link]

Dependencies

  • TensorFlow 2.5.0, Keras 2.5.0, Numpy 1.19.5.

Overview

Swin Transformers are Transformer-based computer vision models that feature self-attention with shift-windows. Compared to other vision transformer variants, which compute embedded patches (tokens) globally, the Swin Transformer computes token subsets through non-overlapping windows that are alternatively shifted within Transformer blocks. This mechanism makes Swin Transformers more suitable for processing high-resolution images. Swin Transformers have shown effectiveness in image classification, object detection, and semantic segmentation problems.

Contact

Yingkai (Kyle) Sha <yingkai@eoas.ubc.ca> <yingkaisha@gmail.com>

The work is benefited from:

  • The official Pytorch implementation of Swin-Transformers [link].
  • Swin-Transformer-TF [link].

License

MIT License