Confusion regarding outputs from first and second module
Mishra1995 opened this issue · 4 comments
Hi,
Thanks for implementing and open-sourcing the code for this T2I model.
I ran the first snippet of the code where the objective was to train a VQGanVAE model.
After training the VQGanVAE model for 50K iterations, I trained the MaskGIT module, although the set of images and texts passed into the training of MaskGIT were less compared to the first module training since I was getting memory issue.
Nevertheless, I passed 10 images and the corresponding texts to train the super resolution GIT and saved the images. The following are few of the images that I am getting.
My query is that whether this is the correct process that I am following? Do I need to train on more images to get the image corresponding to the text?
Thanks!
@Mishra1995 ohh, @lonzi found a bug in the other issue, do you want to retry with 0.0.19?
also, what kind of results are you seeing with vqgan-vae?
Thanks @lucidrains, I have updated the repo with 0.0.19 and repeating the steps again. I will update in this thread once the training is complete!
Hi @Mishra1995, I'm also spending time training this. Would you be open to chatting as we both work on this?
@lucidrains I'm still training the VAE.
I'm training with:
- 50 images
- size=64
- batch_size = 4,
- grad_accum_every = 8,
- num_train_steps = 50000
Here's the results at 17,000 training steps
Does there seem to be a problem with this training configuration, or is this supposed to improve around the 50,000'th step?