/hourglass-vq-vae

An Hourglass Transformer VQ-VAE architecture.

Primary LanguageJupyter NotebookMIT LicenseMIT

Hourglass VQ-VAE

An Hourglass Transformer VQ-VAE architecture.

Goal

As part of the LatentLM project, a first-stage model capable of compressing very long sequences is neccessary. We achieve this by combining Hourglass Transformer with FSQ and Contrastive Weight Tying to construct an attention-only VQ-VAE architecture.

TODO

  • Linear attention.
  • GQA.
  • FlashAttention2 with sliding window to replace linear attention.
  • Attention upsampling to replace linear upsampling.
  • (Optional) experiment with adverserial losses (Hourglass VQ-GAN).
  • Hyperparameter tuning.