lucidrains/transfusion-pytorch

Example of using VAE for image.

Closed this issue · 3 comments

Does anyone succeed to use vae for the image encoder/decoder? How to set it up?

vae = AutoencoderKL.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    subfolder="vae",
)
model = Transfusion(
    num_text_tokens = 1,
    dim_latent = 384,
    channel_first_latent = True,
    modality_default_shape = (4, 4),
    modality_encoder = vae.encoder,
    modality_decoder = vae.decoder,
    transformer = dict(
        dim = 512,
        depth = 8
    )
)

@dingkwang i'll work on a simplified autoencoder training wrapper over at vq-pytorch, then allow for easy integration here

let's keep this open until i finish that (and as a reminder)

@dingkwang hey Dingkang

do you want to read this and see if it is self-explanatory? i'll embark on that autoencoder wrapper this week as well

@dingkwang think it is working given #22