Confusion regarding outputs from first and second module

Question

Confusion regarding outputs from first and second module

Mishra1995 opened this issue 2 years ago · 4 comments

Mishra1995 commented 2 years ago

Hi,

Thanks for implementing and open-sourcing the code for this T2I model.

I ran the first snippet of the code where the objective was to train a VQGanVAE model.

After training the VQGanVAE model for 50K iterations, I trained the MaskGIT module, although the set of images and texts passed into the training of MaskGIT were less compared to the first module training since I was getting memory issue.

Nevertheless, I passed 10 images and the corresponding texts to train the super resolution GIT and saved the images. The following are few of the images that I am getting.

My query is that whether this is the correct process that I am following? Do I need to train on more images to get the image corresponding to the text?

Thanks!

Answer 1 · 2023-01-16T18:51:56.000Z

@Mishra1995 ohh, @lonzi found a bug in the other issue, do you want to retry with 0.0.19?

also, what kind of results are you seeing with vqgan-vae?

Answer 2 · 2023-01-18T12:50:06.000Z

Thanks @lucidrains, I have updated the repo with 0.0.19 and repeating the steps again. I will update in this thread once the training is complete!

Answer 3 · 2023-01-18T16:18:52.000Z

Hi @Mishra1995, I'm also spending time training this. Would you be open to chatting as we both work on this?

Answer 4 · 2023-01-19T14:58:23.000Z

@lucidrains I'm still training the VAE.

I'm training with:

50 images
size=64
batch_size = 4,
grad_accum_every = 8,
num_train_steps = 50000

Here's the results at 17,000 training steps

Does there seem to be a problem with this training configuration, or is this supposed to improve around the 50,000'th step?