Not really an issue
burb0 opened this issue · 13 comments
I'm sorry if this is more a question rather then an actual issue. I'm a newbie at this but, if I understood what you did there, if I were to modifiy (let's say add noise or flip) the images of "datatsets: train: .. paths: path_to_images" of the train_ffhq_glean.yml file, to maximize the learning of the model I should unpack the stylegan2-ffhq-config-f.pkl, apply the same modification as above to all the images and then pack it as pkl?
If I miss something or I'm completely off I would be glad to hear from you
Also thank you for sharing your nice works!
Hey sure thing. I'm working on doing a better job documenting what I built here, but it's slow going. Research ideas generally come first, and there's always more of them. :)
stylegan2-ffhq-config-f.pkl
doesn't have any images in it. It is a pretrained model from NVidia so I'm not quite sure how you would "apply a modification" to it. Can you go into more details there?
I think the root of your question is about adding differentiable image augmentation to the discriminator inputs? I haven't implemented anything like that yet since it doesn't make a lot of sense for image SR (at least - I don't think it does). It should be fairly easy to do though - it would just take the form of an injector that applies whatever kornia augmentations you like.
Sorry for bugging you again, but I didn't find how to use more then 12 GB of GPU for training. Is 12 the maximum usable memorz, or am I missing something?
Thanks again!
If you want to use more memory just scale the batch size (or turn down the "mega_batch_factor"). In my experience, larger batch sizes are almost always better for generative models.
Out of the blue question: if you would have limitation on training time, would you sacrifice traning dataset size or batch size?I mean, is it better to have a larger batch with smaller dataset or smaller batch with bigger(lets say 10% bigger) dataset? (or of course, it depends)
Do you plan adding a test_config.yml for glean? I'm having trouble testing. I'm getting black images as results and I feel it could literally be anything
If you're getting black images - check what is getting output while training? Should be under experiments/<your_training_name>/visual_dbg/gen. Are those all black too? If so you probably have other problems.
If you're getting black images - check what is getting output while training? Should be under experiments/<your_training_name>/visual_dbg/gen. Are those all black too? If so you probably have other problems.
I'm getting clear images on /visual_dbg/gen. I'm using python test.py -opt <my-glean-config>.yml
as described in your works, so probably I'm not testing the model as I should giving it wrong parameters as input for testing.
I'm going to try testing in row Pytorch as you suggested
EDIT: it worked, thank you for your suggestion. Simplicity rocks
Sorry to bother you again, I trained and tested your glean implementation and it really works well. Now I was thinking to push the traning even further by putting some images of dimension 16x16 in the training set. But as you may know this error pops ups while I try to run the training model:
File "*/codes/models/glean/glean.py", line 112, in forward assert self.input_dim == x.shape[-1] and self.input_dim == x.shape[-2] AssertionError
I suppose this error is due to the new input size and also because the "stylegan2-ffhq-config-f.pth" isn't "configured" to take as input such images. Am I wrong? More importantly, If so, how can I train the model with 16x16 images?
Little update: I modified the initial structure of the model as you suggested in 1) and did some other modification to it, trying to adapt it to the input being images of 16x16 pixel. I modified the encoder "reductions" parameter from 4 to 3 (since I don't need to reduce the 32x32 layer anymore), input_dim=16,encoder_rrdb_nb=5 and in I passed the "2x the number of filters" directly in the generator class since i had to train all the network again so that would be GleanEncoder(2nf,.... ) and GleanDecoder(2nf..).
I left everything else the same and it seems to work, at least structure wise. It didn't throw any errors and the training is still going, generating some images. The perceptual quality isn't great yet.
I was sceptical at first because I left the latent_bank_blocks parameter at value 7. Is that parameter not so influentual since as you suggested in the comment "Note that latent levels and convolutional feature levels do not necessarily match, per the paper" or am I missing something?
Also, could you tell me why i need to output the number of filters to 2x times? I think I got it, but I'm afraid I could get entangled in my words in trying to explain it