lucidrains/muse-maskgit-pytorch

Confusion regarding outputs from first and second module

Mishra1995 opened this issue · 4 comments

Hi,

Thanks for implementing and open-sourcing the code for this T2I model.

I ran the first snippet of the code where the objective was to train a VQGanVAE model.

After training the VQGanVAE model for 50K iterations, I trained the MaskGIT module, although the set of images and texts passed into the training of MaskGIT were less compared to the first module training since I was getting memory issue.

Nevertheless, I passed 10 images and the corresponding texts to train the super resolution GIT and saved the images. The following are few of the images that I am getting.

maskgit_2_
maskgit_0_
maskgit_1_

My query is that whether this is the correct process that I am following? Do I need to train on more images to get the image corresponding to the text?

Thanks!

@Mishra1995 ohh, @lonzi found a bug in the other issue, do you want to retry with 0.0.19?

also, what kind of results are you seeing with vqgan-vae?

Thanks @lucidrains, I have updated the repo with 0.0.19 and repeating the steps again. I will update in this thread once the training is complete!

Hi @Mishra1995, I'm also spending time training this. Would you be open to chatting as we both work on this?

@lucidrains I'm still training the VAE.

I'm training with:

  • 50 images
  • size=64
  • batch_size = 4,
  • grad_accum_every = 8,
  • num_train_steps = 50000

Here's the results at 17,000 training steps

image

Does there seem to be a problem with this training configuration, or is this supposed to improve around the 50,000'th step?