Hourglass VQ-VAE

An Hourglass Transformer VQ-VAE architecture.

Goal

As part of the LatentLM project, a first-stage model capable of compressing very long sequences is neccessary. We achieve this by combining Hourglass Transformer with FSQ and Contrastive Weight Tying to construct an attention-only VQ-VAE architecture.

TODO

Linear attention.
GQA.
FlashAttention2 with sliding window to replace linear attention.
Attention upsampling to replace linear upsampling.
(Optional) experiment with adverserial losses (Hourglass VQ-GAN).
Hyperparameter tuning.

oelin/hourglass-vq-vae

Hourglass VQ-VAE

Goal

TODO