SViT: Hybrid Vision Transformer Models With Scattering Transform

This work is published in 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP).

Table of Contents

  1. Release Notes
  2. Introduction
  3. Prerequisites
  4. License
  5. Bibtex

Release Notes

  • Release 1.0, (22.06.2022)
    • Git tag: MLSP-v1.0

Introduction

image Overview of the model: we propose hybrid ViT models with scattering transform called Scattering Vision Transformer (SViT). More specifically, we investigate three tokenizations using scattering transform for ViT: patch-wise scattering tokens (SViTPatch), scattering image feature tokens (SViT-Image), and scattering frequency sub-band response tokens (SViT-Freq).

Prerequisites

Installation

- Clone repository and install Python dependencies

$ git clone https://github.com/TianmingQiu/scattering_transformer
$ cd scattering_transformer
$ pip install -r requirements.txt 

Initialization

- Create local save folder and log folder

$ cd scattering_transformer
$ mkdir checkpoint
$ mkdir log

- Download the dataset

$ cd input/dataset

Train Models

- Configure the parameters of the model in the "custom_dataset.py" and "transforms.py" (if needed)

- Change the variable "DATA_TYPE" to the dataset you want to test into in the main function