Not really an issue

Question

Not really an issue

burb0 opened this issue 4 years ago · 13 comments

I'm sorry if this is more a question rather then an actual issue. I'm a newbie at this but, if I understood what you did there, if I were to modifiy (let's say add noise or flip) the images of "datatsets: train: .. paths: path_to_images" of the train_ffhq_glean.yml file, to maximize the learning of the model I should unpack the stylegan2-ffhq-config-f.pkl, apply the same modification as above to all the images and then pack it as pkl?
If I miss something or I'm completely off I would be glad to hear from you
Also thank you for sharing your nice works!

Answer 1 · 2020-12-31T02:08:21.000Z

Hey sure thing. I'm working on doing a better job documenting what I built here, but it's slow going. Research ideas generally come first, and there's always more of them. :)

stylegan2-ffhq-config-f.pkl doesn't have any images in it. It is a pretrained model from NVidia so I'm not quite sure how you would "apply a modification" to it. Can you go into more details there?

I think the root of your question is about adding differentiable image augmentation to the discriminator inputs? I haven't implemented anything like that yet since it doesn't make a lot of sense for image SR (at least - I don't think it does). It should be fairly easy to do though - it would just take the form of an injector that applies whatever kornia augmentations you like.

Answer 2 · 2021-01-14T16:08:32.000Z

Sorry for bugging you again, but I didn't find how to use more then 12 GB of GPU for training. Is 12 the maximum usable memorz, or am I missing something?
Thanks again!

Answer 3 · 2021-01-14T16:30:04.000Z

If you want to use more memory just scale the batch size (or turn down the "mega_batch_factor"). In my experience, larger batch sizes are almost always better for generative models.

Answer 4 · 2021-01-14T17:59:40.000Z

Out of the blue question: if you would have limitation on training time, would you sacrifice traning dataset size or batch size?I mean, is it better to have a larger batch with smaller dataset or smaller batch with bigger(lets say 10% bigger) dataset? (or of course, it depends)

Answer 5 · 2021-01-14T18:42:10.000Z

In the article I linked above, I assume you have a very large dataset. SR data is very easy to come by so that's a fair assumption. If your dataset is small, smaller batch sizes make sense because the stochastic nature of SGD resists overfitting.

…

On Thu, Jan 14, 2021 at 10:59 AM burb0 ***@***.***> wrote: Out of the blue question: if you would have limitation on training time, would you sacrifice traning dataset size or batch size?I mean, is it better to have a larger batch with smaller dataset or smaller batch with bigger(lets say 10% bigger) dataset? (or of course, it depends) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGLMOTCJQDR74BIRT7BDFTSZ4WJZANCNFSM4VO4TN7A> .

-- - James Betker

Answer 6 · 2021-01-27T14:44:45.000Z

Do you plan adding a test_config.yml for glean? I'm having trouble testing. I'm getting black images as results and I feel it could literally be anything

Answer 7 · 2021-01-27T20:02:47.000Z

Hi, I don't have any intent to add sample test configs. At some point my plan is to break out all of the scripts in DLAS and make it a training library only. Testing should be simple using raw pytorch. Something like: model = GleanGenerator(<your config>).to(<device>) model.load_state_dict(<your saved weights>) img = torchvision.transforms.functional.to_tensor(PIL.Image.open(<your lq image>)).unsqueeze(0).to(<device>) hq=model(img) torchvision.utils.save_image(hq, "out.png")

…

On Wed, Jan 27, 2021 at 7:45 AM burb0 ***@***.***> wrote: Do you plan adding a test_config.yml for glean? I'm having trouble testing. I'm getting black images as results and I feel it could literally be anything — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGLMOT5FJPUK6FHFNO45CDS4ARG7ANCNFSM4VO4TN7A> .

-- - James Betker

Answer 8 · 2021-01-27T20:04:10.000Z

If you're getting black images - check what is getting output while training? Should be under experiments/<your_training_name>/visual_dbg/gen. Are those all black too? If so you probably have other problems.

Answer 9 · 2021-01-28T09:02:46.000Z

If you're getting black images - check what is getting output while training? Should be under experiments/<your_training_name>/visual_dbg/gen. Are those all black too? If so you probably have other problems.

I'm getting clear images on /visual_dbg/gen. I'm using python test.py -opt <my-glean-config>.yml as described in your works, so probably I'm not testing the model as I should giving it wrong parameters as input for testing.
I'm going to try testing in row Pytorch as you suggested

EDIT: it worked, thank you for your suggestion. Simplicity rocks

Answer 10 · 2021-02-02T20:39:26.000Z

Sorry to bother you again, I trained and tested your glean implementation and it really works well. Now I was thinking to push the traning even further by putting some images of dimension 16x16 in the training set. But as you may know this error pops ups while I try to run the training model:
File "*/codes/models/glean/glean.py", line 112, in forward assert self.input_dim == x.shape[-1] and self.input_dim == x.shape[-2] AssertionError
I suppose this error is due to the new input size and also because the "stylegan2-ffhq-config-f.pth" isn't "configured" to take as input such images. Am I wrong? More importantly, If so, how can I train the model with 16x16 images?

Answer 11 · 2021-02-02T20:54:41.000Z

You're correct - The GLEAN model (from the paper) must be tightly bound to an input dimension since the encoder needs to directly hook into the StyleGAN2 latent bank at the proper dimensions. It sounds like you trained a model with a 32x32 input size so that is what it is restricting to you to. You have a couple of options: 1. Take your current model and chop off the top encoder layer. You'll need to re-train initial_conv to output 2x the number of filters and the decoder will no longer receive a passthrough input from that dimension so you'll need to alter those connections as well. You should be able to get away with doing this and only re-training your new layers while keeping the rest of the model frozen. Since you're training the first layer, fine-tuning will take awhile (gradients are normally much smaller in the first layers than the last..) 2. You can just interpolate your 16x16 input into a 32x32 input and fine-tune the model on that interpolated input. This should be faster to get up and running than (1) but will be computationally inefficient since you are rendering the first encoder layer a no-op. 3. Unlink the size requirements (remove the assert) and see what happens. It might be fine. In my experience, GLEAN is pretty fragile in this regard, though.

…

On Tue, Feb 2, 2021 at 1:39 PM burb0 ***@***.***> wrote: Sorry to bother you again, I trained and tested your glean implementation and it really works well. Now I was thinking to push the traning even further by putting some images of dimension 16x16 in the training set. But as you may know this error pops ups while I try to run the training model: File "*/codes/models/glean/glean.py", line 112, in forward assert self.input_dim == x.shape[-1] and self.input_dim == x.shape[-2] AssertionError I suppose this error is due to the new input size and also because the "stylegan2-ffhq-config-f.pth" isn't "configured" to take as input such images. Am I wrong? More importantly, If so, how can I train the model with 16x16 images? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGLMOXSN2BJ6TLW6ZHSG2TS5BPI7ANCNFSM4VO4TN7A> .

-- - James Betker

Answer 12 · 2021-02-13T11:59:07.000Z

Little update: I modified the initial structure of the model as you suggested in 1) and did some other modification to it, trying to adapt it to the input being images of 16x16 pixel. I modified the encoder "reductions" parameter from 4 to 3 (since I don't need to reduce the 32x32 layer anymore), input_dim=16,encoder_rrdb_nb=5 and in I passed the "2x the number of filters" directly in the generator class since i had to train all the network again so that would be GleanEncoder(2nf,.... ) and GleanDecoder(2nf..).
I left everything else the same and it seems to work, at least structure wise. It didn't throw any errors and the training is still going, generating some images. The perceptual quality isn't great yet.
I was sceptical at first because I left the latent_bank_blocks parameter at value 7. Is that parameter not so influentual since as you suggested in the comment "Note that latent levels and convolutional feature levels do not necessarily match, per the paper" or am I missing something?
Also, could you tell me why i need to output the number of filters to 2x times? I think I got it, but I'm afraid I could get entangled in my words in trying to explain it

Answer 13 · 2021-02-13T16:30:56.000Z

latent_bank_blocks is described briefly in the paper on page 3, it describes the column-wise shape of C. C is later chunked apart to create the latents you feed into the latent bank. If you reduce the number of latent bank layers you are using and don't modify this value, there should be no real problem - you are just wasting computation since you never use a chunk of C. Not sure if that makes sense. Basically with this parameter if it is not throwing errors at runtime, it is not important. Scaling it up or down will have no effect on model quality. Sorry, the "2x" statement was merely if you wanted to use your existing trained model. This is a really finicky process, I wouldn't recommend it unless you really have some time and patience. It sounds like you are training from scratch - you don't need to do this then.

…

On Sat, Feb 13, 2021 at 4:59 AM burb0 ***@***.***> wrote: Little update: I modified the initial structure of the model as you suggested in 1) and did some other modification to it, trying to adapt it to the input being images of 16x16 pixel. I modified the encoder "reductions" parameter from 4 to 3 (since I don't need to reduce the 32x32 layer anymore), input_dim=16,encoder_rrdb_nb=5 and in I passed the "2x the number of filters" directly in the generator class since i had to train all the network again so that would be GleanEncoder(2*nf,.... ) and GleanDecoder(2*nf..). I left everything else the same and it seems to work, at least structure wise. It didn't throw any errors and the training is still going, generating some images. The perceptual quality isn't great yet. I was sceptical at first because I left the latent_bank_blocks parameter at value 7. Is that parameter not so influentual since as you suggested in the comment "Note that latent levels and convolutional feature levels do not necessarily match, per the paper" or am I missing something? Also, could you tell me why i need to output the number of filters to 2x times? I think I got it, but I'm afraid I could get entangled in my words in trying to explain it — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGLMOSSF5PIIUC2LTJS6Y3S6ZSRZANCNFSM4VO4TN7A> .

-- - James Betker