This looks extremely similar to Paella (not sure which one is the better approach)

Question

This looks extremely similar to Paella (not sure which one is the better approach)

Mut1nyJD opened this issue 2 years ago · 5 comments

The only difference is that they use Masked tokens while they use noised tokens

https://arxiv.org/pdf/2211.07292.pdf

Answer 1 · 2023-01-03T17:09:23.000Z

@Mut1nyJD it goes way back actually, to Mask-Predict, VQ-Diffusion, then the breakout happened with MaskGit, followed by Phenaki

Paella is basically MaskGiT, but all convolutions. Not sure if I believe in that, after all that I have seen

Answer 2 · 2023-01-03T17:13:07.000Z

not sure which one is the better approach

we'll just have to get the code out there for people to try!

Answer 3 · 2023-01-03T17:48:28.000Z

@Mut1nyJD it goes way back actually, to Mask-Predict, VQ-Diffusion, then the breakout happened with MaskGit, followed by Phenaki

Paella is basically MaskGiT, but all convolutions. Not sure if I believe in that, after all that I have seen

True I completely forgot about Phenaki because it was tailored to Video but in the end yes you are right.
So what's the big difference / novelty between this and Phenaki does not seem obvious to me by skimming through their project page

Answer 4 · 2023-01-03T18:19:00.000Z

@Mut1nyJD the battle is far from over

i'm guessing someone will try an all-attention approach for latent diffusion next. they also did not compare to progressive distilled ddpm models, so the jury is still out on what is more efficient

Answer 5 · 2023-01-10T12:34:18.000Z

@lucidrains There was a paper out in December by William Peebles building a latent diffusion model with only ViT-style attention blocks. From a cursory glance, adding residual gating and using a really high EMA update factor were essential for training stability. Unfortunately, they only published quantitative results on ImageNet, and also did not compare results with distilled DDIM models.

https://arxiv.org/pdf/2212.09748.pdf
https://www.wpeebles.com/DiT.html
https://github.com/facebookresearch/DiT