stylegan3
betterze opened this issue · 18 comments
Dear Justin,
Thank you for creating the great repo. Do you have plan to support stylegan3?
Thank you for your help.
Best Wishes,
Alex
Hey, same question here :)
If you're not interested in doing this valuable work anymore I'd be happy to take over and maintain a new repo.
I was thinking of making a stylegan 3. In particular I have a wikiart model trained to share. But I'm not aware of that many others right now. Do you have any links?
I'm training one on corgi images right now (see my repos). But I guess once there is a pretrained sg3 repo out there with an application form, people with models will approach you by themselves?
A Wikiart pkl pretrained on StyleGAN3 (compatible with all the new sg3 bells and whistles) would be a benevolent gift from the heavens at the moment.. Any model well-suited for finetuning really--the WikiArt models for sg2-ada have been super versatile for several experiments I've worked on in the past, but even a landscape model or room interiors or forests... Anything other than portraits and faces for finetuning.
Was kind of a huge bummer to see that was all that Nvidia released for the version 3 models (considering MSRP:Actual Cost on cards right now and the massive popularity of models like TADNE with SG2). I want to test drive it's capability of being trained to generate depth images for translation into pointclouds among other generative encoding experiments but I don't have $9000 for a couple quadro rtx 6000s atm. ..Been overkilling effort on dataset prep while I wait on someone with GPU muscle to share something I can actually use (you, Gwern, Arfafax, Eldaritch and many others in the community have genuinely helped keep people like myself from getting left behind due to prohibitive cost, whether it's knowledge or the models you all share--cannot thank you enough, seriously)
Anyway, dying to dig in to StyleGAN3, just don't have the GPU in 2021 to use it. Colab pro works well for finetuning, just don't have a relevant model to finetune on atm.
Keep an eye on: https://github.com/justinpinkney/awesome-pretrained-stylegan3
Hoping if I get the approvals I get release the wikiart model today (it's associated with my company's blog post)
Honestly though, I've tried fine-tuning a few models and the performance is generally much worse that stylegan2. In that the training is more unstable, and it takes longer to get good results. (i.e. while stylegan2 tends to blend towards the new domain, stylegan3 tends to go through a phase where the whole thing looks rubbish, so fine-tuning takes much longer). Although these are just some casual observations.
I had been trying to train a 1024 landscape model but had given up because I couldn't get a stable training run.
I had several such failures myself at 512, took a very long time too. I've had some small amount of luck with transfer learning onto the 256 face model (transfer learning to full body images of humans) but the faces never get any better after a certain point and any of the redeemably decent results are probably due to the dataset I spent about a week putting together (u2net/rembg for removing the background of each bounding-box cropped highest value YOLOv4 detection from a single 'person' in a collection of fashion datasets I put together. each alpha blended and scaled, keeping aspect ratio, back onto 256x256 white backgrounds so that about 95% of the images have a single full body person in the center of a square white image. N=~50k). The dataset needs to be manaully cleaned of remaining images with just shirts/clothes etc in them or ones with bad alpha blending/segmasking, but there were maybe like 200-500 bad images in the whole dataset (and it still generates them to a disproportionate amount seemingly? idk..)
The 256x256 models are quite fast, like an order of magnitude faster to transfer learn than 512 but the results are meh.. I have this dataset in 512 and 1024 too (most image original resolutions were much larger than 1024, so they're the 512 and 256 are just downscales from the 1024), but have been working on another experiment in an attempt to train on multiple domains in SG2-ADA Pytorch (southern hip hop + gangster rap album art with parental advisory logo, + anime combat scenes with firearms + hand picked images from Gwern's Figures Crops dataset, N=~60k). I want to take what I learn from the multidomain experiment and apply it to training one of the later StyleGANs on depth images, then use something like pix2styl2pix to train again for translation into depth-registered RGB image generation, so the logical place to start was getting a full body human generator working. If I had the GPU I'd give the 1024 faces model from SG3 a whirl on the dataset I created but it was such a boondoggle. Weeks of training to get unusuable results that promptly exploded--not fun lol:
Anyway Below is an interpolation and a mosaic after finetuning the SG3 ffhqu 256 model for an evening on colab pro (about 7 hours). Looks okay from a distance, and setting the trunc to .6 makes it much more bearable (the interpolation video was from an earlier attempt, only after like 90-100 kimg on the dataset I prepped, before I cleaned it lot better and added more images.. but the trunc 0.5 in that video made it at least bearable so that's an ok sign I suppose. The image mosaic is like 420 or so kimg with trunc at 1.0 (or whatever the default is for progress). I followed some basic arithmetic from the paper on calculating gamma and put it to 3.2, and that appears to have been the magic sauce for this model-- kept it from exploding after 100 kimg or so (which it did about 4 times before it remained stable up through 400kimg). Considering just getting this model as good as it can get and then just using an upscaler on the outputs for generating 2D-3D skeletons or 2D-3D 'neural texture' rendering etc:
interpolation at about 100kimg per the previous description, trunc at .6:
Another run with a cleaner/larger dataset up to about 400kimg, results probably would look much better if I did another output at a lower trunc (might try that and post a result here), I think the default is 1 for the progress images during training for SG? which is what this is (not sure if that trunc default is correct but it's certainly higher than .5 in the progress images, so it's likely this one look way better when truncated properly)
only having a low-end 10 series gpu and colab in 2021 and being a CV/ML dev on a budget feels not unlike trying to build a city but only having a garden hose for the water supply lol. Pretrained models for finetuning are extremely useful. Thanks again for your previous efforts and consideration here!
Awesome, thanks for all the details. Here's the repo for stylegan 3:
https://github.com/justinpinkney/awesome-pretrained-stylegan3
I'll add more models as I find them or train them
you are THE BEST. Thank you!!!
This is like an early xmas, I'm out of town on work for a week with nothing to do so this literally made my week, thank you so much!
Now that I have my process somewhat down for calculating gamma, I may do some test runs on colab with this one then buy a day or two of some v100 juice on Lambda labs cloud once it's dialed-in. This likely resolves countless months of slow, uncertain failures--now I have a baseline I can use for testing/finetuning/performance vs SG2-ADA and be able to figure out the sweet spots. Also now have a path to justify spending a little money on some cloud-accelerated experiments, this is genuinely exciting!!
..sorry for fanboying a bit here, but the WikiArt model on SG2-ADA really empowered a ton of my own personal projects--it allowed me to get good enough results from finetuning on several different domains, and to learn enough about StyleGAN in general to even use it effectively. The more effective I can manage to be with training, the more I can justify upgrading my cloud resources to paid services. You sharing these models reaches people who maybe can't buy hardware but might can afford better-than-colab GPU on cloud, but might also avoid doing that since they don't have a proven test to justify a purchase. Your efforts are an investment that benefits both public and private, cannot show enough gratitude, and if you have any suggestions for cloud resources I'm all ears.
Thanks again!!
-Gene
Nice! Thanks for the kind words! I am still a little unsold on stylegan 3 though. I've been trying to train a 1024 model on lhq (landscape photos) but have really struggled to avoid training breaking down. I'll probably give up soon and share what I have so far.
Also one note about the wikiart model is that I forgot the default number of mapping layers is 2, I wish I'd switched it to 8 as I think that would have done much better for such a complex dataset. I guess I might fine tune a variant like that one day.
For what it's worth, sg3 seems VERY finicky about gamma. I know the other versions also require special attention to gamma, but with sg2-ada there are tricks/hacks to recover progress from a off-rail training session. Whenever SG2-ada starts generating junk on a diverse dataset, I try to turn gamma down to 1 and disable aug and xflips for ~100-300 kimg or until it starts generating somewhat boring/simple/plain outputs, but ones that make "logical" visual sense for the target domain/domains (if it's an Anime or other character dataset for example, and the model drifted into not having faces or bodies in too many of the outputs, wait until you see discernible faces/characters in most of the images--even if they are plain/bad and the model looks like it's just getting worse, just get it to start generating anything that technically could represent a character), then crank gamma really high (like 50) for another 30kimg or so before setting it back to whatever the arithmetic from the paper for your model requires.
About 50% of the time on sg2ADA the model starts generating 'sharpened' outputs in the target domain again and training progress sort of just starts working again.
SG3? I've had the model explode, resumed from checkpoints over 1000kimg ahead of the explosion, adjusting every param I can think of and nothing has an impact. Still explodes at the exact same place unless I start completely over with a specific set of hyperparams. It is not easy to figure this one out, but I am still drooling over the interpolation features on this new model. Idk
Idk if it's a thing on sg2 but the ability to move 'camera' position within a model interpolation as well as rotate elements in the image looks so incredibly useful. 3D or video pipelines are one thing, but for instance generating video samples that 'dance' to the music, from different genres of album art, to use in a context like VJing in Resolume for live music seems like an achievable goal.
The appearance of enhanced elements of "control interfacing" with SG3 is what has me curious about it, seems like it'd be a great vector for a lot of utility. It's just been mad difficult to figure out how to train it--hoping to have some tricks/hacks to figure out some of the sweet spots to get some heady models trained on it soon. Again, having something that can get me off the ground for fine-tuning was the missing piece--thanks again.
Hi @WyattAutomation , do you mind pointing me to sg2-ada wikiart pre trained model? the one listed here is trained on sg2 and google didn't help me.
Thanks.
I think this is the right one, let me know if it does't work https://drive.google.com/file/d/1-5xZkD8ajXw1DdopTkH_rAoCsD72LhKU/view
Thanks @WyattAutomation, I am able to download and use it, though this seems to be a 1024x1024 and I was looking for 256x256 (for transfer learning) which is easier to mange on colab.
Nice! Thanks for the kind words! I am still a little unsold on stylegan 3 though. I've been trying to train a 1024 model on lhq (landscape photos) but have really struggled to avoid training breaking down. I'll probably give up soon and share what I have so far.
Also one note about the wikiart model is that I forgot the default number of mapping layers is 2, I wish I'd switched it to 8 as I think that would have done much better for such a complex dataset. I guess I might fine tune a variant like that one day.
Any luck with the LHQ 1024 model? I've been also trying to train SGAN3 on this dataset and it proves indeed quite tricky...
Are you planning to share the 1024 weights that you got so far? It could be used as a starting point to play with gamma values as noted above. P.S.: Your LHQ 256 works like a charm many thanks!!
@nausithoe I could not get it to work. Not sure its worth sharing my weights as I feel like they are "cursed" it might be better off starting from scratch to avoid the collapse. The outputs aren't really of notable quality anyway. I might revist with a different landscape dataset as LHQ is a bit too varied and maybe something more consistent would work. I've seen a bunch of people successfully train a landscape model (but no one willing to share weights unfortunately)
Thanks, I'm trying out. If anything robusts comes out i'll share it here. Should you have any insights on the hyperparameter tuning it would be welcome (i'm experimenting with different gammas and mapping layer numbers).