Comparison discussion

Question

Comparison discussion

x-legion opened this issue 2 years ago · 92 comments

MultiDiffusion Seems to be doing worse (not sharp) or am i doing something wrong?
original:

MultiDiffusion:

Ultimate SD Upscale:

Answer 1 · 2023-03-07T10:13:37.000Z

Hello, would you please provide your weights (including the checkpoint & lora needed if you use lora) for your original image? I need them to reproduce your results in an oil-painting fashion. The MultiDiffusion results can be severely affected by the model checkpoints & lora you used.

But generally speaking, extraordinary high CFG Scale, and slightly higher denoising value will give you satisfying details. Example positive prompts are "highres, masterpiece, best quality, ultra-detailed unity 8k wallpaper, extremely clear, very clear, ultra-clear". You don't need anything concrete things in positive prompts; and then, drag the CFG Scale to an extra-large value. Denoising values between 0.1 and 0.4 are all OK but the content will change accordingly.

Here is my result of CFG=20, Sampler=DPM++ SDE Karras, denoising strength=0.3 for example. As I use the protogenX34 checkpoint, my painting style will be wildly different from yours:

Please comment on this issue if you find your results have significantly improved after you use proper model and CFG values.

Answer 2 · 2023-03-09T15:55:02.000Z

Hi there, I will write here to not create new "issue" about similar thing.
Would be possible to write down or picture all settings that were used to upscale picture attached in extension description ? I think I tested everything but only what I get is blurred upscaled picture. Here is one of example results that shows how blurry result is (not to mention about lack of extra details with denoise at 0.3 and CFG at 20 - as example). Atm. I want copy 1:1 everything to see if issue is on my side or what. Thanks for create that extension - have high hopes
Example picture.

Answer 3 · 2023-03-09T17:07:27.000Z

Hello, as you wish I provide the PNG info:

Here is the text version for your convenience. All resources are public things, but I'm quite busy and cannot provide your links.

masterpiece, best quality, highres, extremely detailed 8k unity wallpaper, ultra-detailed
Negative prompt: EasyNegative
Steps: 24, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1614054406, Size: 4096x3200, Model hash: 2ccfc34fe3, Model: 0.9(Gf_style2) + 0.1(abyssorangemix2_Hard), Denoising strength: 0.4, Clip skip: 3, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 4, MultiDiffusion tile width: 128, MultiDiffusion tile height: 128, MultiDiffusion overlap: 64

If you don't know any of them, you can Google it. But your result is likely to come from pool positive and negative prompts, where I use a Textual Inversion called EasyNegative from civitai.com.

Answer 4 · 2023-03-09T18:19:34.000Z

Click Here for Better Comparison View

original

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3857533696, Size: 640x960, Model: dreamniji3fp16, Clip skip: 2, ENSD: 31337, Discard penultimate sigma: True

Ultimate SD upscaler

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, Ultimate SD upscale upscaler: 4x_foolhardy_Remacri, Ultimate SD upscale tile_width: 768, Ultimate SD upscale tile_height: 768, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Discard penultimate sigma: True

MultiDiffusion

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 2, Discard penultimate sigma: True

Answer 5 · 2023-03-09T18:24:15.000Z

Ok, now I know it might be something wrong on my side. I can see additional details (will check its because of clip skip 3 or upscaler or what) but its still blurred. That super weird - ahh and thanks for reply. Attached pictures to description don't have infos attached (that why I ask :) )

Answer 6 · 2023-03-09T18:46:59.000Z

https://imgsli.com/MTYwOTcx same here again

Answer 7 · 2023-03-09T23:23:06.000Z

Hello, thanks for your interests in this work. I tried for several minutes on your image and here is my result with no tuning:
https://imgsli.com/MTYxMDI5.

It's hard to tell what is better; if you like illustration-style sharpness and faithfulness to the original image, may be Ultimate SD Upscaler + 4x Ultra Sharp is your best choice. But personally I'd like to see some fabricated details on realistic human face, so I prefer this tool.

It's noteworthy that, the biggest difference between MultiDiffusion and other upscalers is that currently it doesn't support any concrete contents when you upscale a image, otherwise each tile will contain a small character and your image finally becomes blur and messy.

The correct prompts is just as follows. I even don't use lora:

And my configurations, FYI:

Answer 8 · 2023-03-09T23:23:08.000Z

I provide the PNG info

I tried to replicate your settings with an image provided by OP and it's still very blurry:

Compared to an image you sent:

As you can see, settings are pretty much the same except CFG scale:

Answer 9 · 2023-03-09T23:26:26.000Z

Update: Oh I just noticed that, EasyNegative is a textual inversion from civitai.com, it is not a word. Please download that textual inversion.

Here is the link: https://civitai.com/models/7808/easynegative

The Upscalers are important too. I personally use two: 4x-UltraSharp and 4x-remacri. Here is the link:
https://upscale.wiki/wiki/Model_Database
Where you can find the two upscalers and put it in your ESRGAN folder.

Answer 10 · 2023-03-09T23:30:11.000Z

4x-remacri

I used it with the image above

EasyNegative is a textual inversion

Already downloaded this embedding

Answer 11 · 2023-03-09T23:30:46.000Z

4x-remacri

I used it with the image above

Do you use EasyNegative embeddings?

You mean you have used it in the above images?

Answer 12 · 2023-03-09T23:33:13.000Z

You mean you have used it in the above images?

Yes, it was used

UPD:

Answer 13 · 2023-03-09T23:48:14.000Z

You mean you have used it in the above images?

Yes, it was used

UPD:

I spend some time to find the original PNG info. Here is it, please try to reproduce using my params:

Answer 14 · 2023-03-09T23:55:46.000Z

It may not be as easy as the Ultimate Upscaler to use, as it's essentially a completely redraw without post-processing. Personally I have some intuitions to use it:

No concrete positive prompts. Just something like clear, very clear, ultra clear
Don't use too large tile size as SD 1.4 is only good at 512 - 768 (so you divide it by 8 and get 64 - 96).
Large CFG Scales, Eular a & DPM++ SDE Karras, Denoising=0.2-0.4
Try both 4x-UltraSharp and 4x-Remacri
Clip Skip=2 or 3 worth to try.

Answer 15 · 2023-03-10T00:10:24.000Z

please try to reproduce using my params

I just did it and it's a lot better

Settings (Even seed is the same):

But still it can't generate a result as good as yours
I know it highly depends on a hardware, but there's a very large difference in details
No any optimizations used (Such as xformers, opt-split-attention etc.)

My:

And yours:

Answer 16 · 2023-03-10T00:18:53.000Z

please try to reproduce using my params

I just did it and it's a lot better

Settings (Even seed is the same):

But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)

My:

And yours:

I'm also confused. Are you using this model?

https://civitai.com/models/3666/protogen-x34-photorealism-official-release

I see our model hash is different. Except from this I couldn't find something else.

Answer 17 · 2023-03-10T08:56:49.000Z

I'm also confused. Are you using this model?

Yes, I used protogen_x3.4, but pruned
Now I downloaded 5GB version with the same hash as your and THAT'S AMAZING

Very huge improvement in details:

It still not produces the exact same result as yours, I quess it depends on a hardware, but details are unbelievable, I can clearly see stitch seam on the sleeve

Answer 18 · 2023-03-10T09:02:24.000Z

Oh thanks for your feedback. I don't know that pruned model can affect the details too before you test it.

Answer 19 · 2023-03-10T09:56:43.000Z

Ohh! I think not many knows that to be honest o_O As much as I understand pruning, it should not affect such task as upscalling via small tiles? I gonna try with not pruned model as well and let you know.

Edit. No clue but today everything works as it should. Maybe Its needed to turn off and on everything, not just to restart UI - just like during installing Dreambooth

Answer 20 · 2023-03-10T14:33:48.000Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

Answer 21 · 2023-03-10T15:58:38.000Z

More tests. ControlNet not work or it need way lower denoise than I used.
Upscaling for attached was in two passes plus dynamic CFG script - agree, way to off from original picture, but now when i know what and where, its time for fine tunning (hopefully to figure out issue with control net).

Indeed its essential to test couple upscalers because differences are huge - even bigger than used SD model.

Answer 22 · 2023-03-10T16:03:22.000Z

Left is my, right is pkuliyi2015
As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests

Answer 23 · 2023-03-11T16:39:13.000Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

https://github.com/dustysys/ddetailer.git try this one

Answer 24 · 2023-03-11T19:56:26.000Z

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

I’m sorry for accidentally wrong edit.

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

Answer 25 · 2023-03-12T03:28:01.000Z

The key point is that I need a user interface to draw bbox, so that you can draw rectangles and control the MultiDiffusion with different prompts. In this way the result should get way better.

Why? because in this way you can just select the woman's face and tell SD to draw a beautiful woman's face. Then the SD will try his best, using his 512 * 512 resolution to ONLY draw a face. The resolution will be unprecedentedly high for SD models, as he dedicated to draw only one part of the image at the best of his capabilities.

However, when I was adding features I saw this f**king issue:
gradio-app/gradio#2316

Some one pr a bbox tool but the officials denied the merging:
gradio-app/gradio#3220

I don't know what are they thinking in mind to deny such a good PR (from my perspective) but don't provide their own solutions. It has been a half year since it was first proposed.

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

Answer 26 · 2023-03-12T19:26:54.000Z

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

Check out this extension: https://github.com/hnmr293/sd-webui-llul

It fakes it by having you move around a rectangle in a separate window.

Answer 27 · 2023-03-13T13:08:41.000Z

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance] shortcode.

If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.

The [zoom_enhance] shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.

The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.

How does it work?

The [zoom_enhance] shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.

Features and Benefits

Great in both txt2img and img2img modes.
The shortcode is powered by the [txt2mask] implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of [txt2mask] settings, such as "padding" and "negative_mask."
It's also pretty good at deepfakes. Set mask="face" and replacement="another person's face" and check out the results.
It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.
It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.
It is significantly faster than Hires Fix and won't mess up the rest of your image.
Compatible with A111's color correction setting.

How to use it?

To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.

Once you have Unprompted, simply add this line anywhere in your prompt:

Answer 28 · 2023-03-13T14:20:59.000Z

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

Answer 29 · 2023-03-13T18:26:29.000Z

How long does it take you to upgrade a photo, how can it be faster? Here are my settings

Answer 30 · 2023-03-13T20:01:59.000Z

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

Really impressive. Do you know about a user-friendly UI for the DDNM?
multi-diffusion is a great idea btw.

Answer 31 · 2023-03-13T20:40:50.000Z

Update: sadly I found the DDNM only works in pixel space, not for latent space. The result is not usable for latent diffusion models. I'm consulting the original authors. Here is my result via my self-implemented test extension:

Answer 32 · 2023-03-15T12:45:06.000Z

So while this is great for faces it completely destroys my fingers in anything but 0.1 denoise. Still it's a great improvement from ultimate upscaler in speed

Answer 33 · 2023-03-16T15:35:12.000Z

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance] shortcode.

If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.

The [zoom_enhance] shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.

The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.

How does it work?

The [zoom_enhance] shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.

Features and Benefits

Great in both txt2img and img2img modes.

The shortcode is powered by the [txt2mask] implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of [txt2mask] settings, such as "padding" and "negative_mask."

It's also pretty good at deepfakes. Set mask="face" and replacement="another person's face" and check out the results.

It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.

It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.

It is significantly faster than Hires Fix and won't mess up the rest of your image.

Compatible with A111's color correction setting.

How to use it?

To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.

Once you have Unprompted, simply add this line anywhere in your prompt:

what's next?what's prompt?

Answer 34 · 2023-03-17T22:27:59.000Z

You say this is supposed to be faster than UltimateSD Upscale, but it feels about the same if not worse, to me? I don't know what's set wrong, because a 3090 should at least be comparable to a v100... but I can't even use the same settings (if I set the tiling to 128x128, I get an out of memory error--which is even weirder, I have more VRAM), and a 20-step batch to double an image size takes 10 minutes. The performance seems outright bad.

Then there's the issue where if you try to upscale anyone with a tan it will remove it in the process, but that's just issues with not being able to use specific prompts.

Answer 35 · 2023-03-17T23:15:38.000Z

+1 to @RainehDaze comment on performance - I have an RTX3090 as well and I can't get anywhere close to the supposed ~1 minute to upscale that the V100 got. Mine also takes close to 10 minutes for upscaling to 2160p/4K @ 20 steps.

Answer 36 · 2023-03-18T00:26:06.000Z

10 minutes is absolutely crazy. There must be incorrect params. What's your params?

I give you detailed screenshots of mine.

For example, when you set overlap = 48 and tile size = 96, tile batch size=8, it will consume 2m38s in total.

Here is my parameters

And the input image is here:

The test for overlap=32 and tile size=96 is here, this time it only takes 1 min 49s. If you reduce the overlap more it will become even faster:

And the output image is like this:

Answer 37 · 2023-03-18T00:27:42.000Z

Is it something about the tiling VAE? It set itself by default to size 3092 and doesn't kick in at all.

Everything else seems pretty similar. hell, it was taking 10 minutes to double a 768x768 image.

Answer 38 · 2023-03-18T00:31:20.000Z

Is it something about the tiling VAE? It set itself by default to size 3092 and doesn't kick in at all.

Everything else seems pretty similar. hell, it was taking 10 minutes to double a 768x768 image.

Please refer to my params. The problem is in MultiDiffusion.

The key factor is overlap and tile batch size. Larger overlap will significantly slow your speed, and larger tile batch size (this will take your VRAM) will significant increase your speed.
Normally, tile size should be set to 96 in most cases.

Update: the regional prompt control is about to complete. At that time the multidiffusion should be expected to perform better with your custom per-region prompt!

Answer 39 · 2023-03-18T00:41:45.000Z

Is it something about the tiling VAE? It set itself by default to size 3092 and doesn't kick in at all.
Everything else seems pretty similar. hell, it was taking 10 minutes to double a 768x768 image.

Please refer to my params. The problem is in MultiDiffusion.

The key factor is overlap and tile batch size. Larger overlap will significantly slow your speed, and larger tile batch size (this will take your VRAM) will significant increase your speed. Normally, tile size should be set to 96 in most cases.

Update: the regional prompt control is about to complete. At that time the multidiffusion should be expected to perform better with your custom per-region prompt!

And this is with tiling VAE on, yes?

It's still taking about 8 minutes, even with all settings the same (Except mask blur, because I'm not on inpainting and I'm not sure where that's even coming into this)

Answer 40 · 2023-03-18T00:52:39.000Z

Is it something about the tiling VAE? It set itself by default to size 3092 and doesn't kick in at all.
Everything else seems pretty similar. hell, it was taking 10 minutes to double a 768x768 image.

Please refer to my params. The problem is in MultiDiffusion.
The key factor is overlap and tile batch size. Larger overlap will significantly slow your speed, and larger tile batch size (this will take your VRAM) will significant increase your speed. Normally, tile size should be set to 96 in most cases.
Update: the regional prompt control is about to complete. At that time the multidiffusion should be expected to perform better with your custom per-region prompt!

And this is with tiling VAE on, yes?

It's still taking about 8 minutes, even with all settings the same (Except mask blur, because I'm not on inpainting and I'm not sure where that's even coming into this)

Really weird. May I have a screenshot of your params? Mine is like this (denoise=0.3)

Answer 41 · 2023-03-18T01:01:26.000Z

Okay, comparing it and trying again, the decoder tile size was lower, the latent tile batch size was at 1, and the overlap (following the parameters in the png info above) was 48. Matching settings exactly, it goes down to about 3:30 total. Which is definitely an improvement.

Which still seems a little odd (shouldn't a 3090 be outperforming a Tesla v100?); I wonder if it's having half VAE disabled to stop overflow errors?

Answer 42 · 2023-03-18T01:30:45.000Z

I'm happy to see your great improvements (10min->3m30s)!

Theoretically you should be faster than me. But there are too many params and variables in the whole SD process, like half VAE, samplers, upscalers, denoise strength, other extensions, upscalers, Lora, ControlNet, so I'm also not sure what's the root cause.

If you have some findings, please also tell me and I will be happy to find room for optimization.

Answer 43 · 2023-03-18T01:33:19.000Z

I guess if the other guy with a 3090 chips in, we might have a point for comparison if the numbers wind up massively different.

Although, can you go to a higher latent tile size? Because that definitely suggests something might be off on my end, a 3090 has more VRAM.

Answer 44 · 2023-03-18T02:53:47.000Z

However larger tile size won't increase your image quality; on the contrary it may lower your quality. As most models is not trained on image larger than 1024*1024, 128 (just 1024/8) would be the largest size I would recommend.

You can try, I believe larger size can give you faster speed at the cost of image quality. Also, the tile batch size 8 is the best choice where UNet achieves its highest performance. > 8 won't be faster.

Answer 45 · 2023-03-18T03:00:42.000Z

See, I'm not questioning whether 128x128 would lead to better quality, I'm just asking if you can run it. If I try running on greater than 96x96, I get an OoM error, so I'm wondering if that's an environment thing.

Answer 46 · 2023-03-18T03:16:55.000Z

Oh I'm on 32 GB NVIDIA V100. The UNet itself will consume more than 24 GB when you use 128 latent tile size together with 8 tile batch size, therefore OOM is natural. The details are as follows:

Basically with MultiDiffusion, you are drawing tile_batch_size images of size 8 x tile size simultaneously. For example, if you set tile size =128 and batch size = 8, then in each forward pass, the VRAM usage of UNet will be identical to drawing 8 1024*1024 images without MultiDiffusion.

Hence, if you get OOM errors in normal generation, setting batch size = 8 and image height & width = 1024, you should get errors with MultiDiffusion too.

Answer 47 · 2023-03-18T03:24:32.000Z

Ah, that explains it, I thought it would be a 16GB Tesla.

Answer 48 · 2023-03-18T04:29:17.000Z

I have been messing around with this for a few days and have found it works best with one dimension set to 256 and the other set to around 64 with a small overlap of 4 or 8.
I used controlnet canny to make this anime drawing more realistic and detailed with only 10 steps.
I am using an RTX 3060 fyi

Answer 49 · 2023-03-18T04:51:37.000Z

For anyone on a 3090 just up the batch size to 8 and leave everything else default. Mine can make a 2000x2000 image in 30 seconds

Answer 50 · 2023-03-18T20:24:33.000Z

Left is my, right is pkuliyi2015 As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests

hi,can you tell me what upscaler you use?

Answer 51 · 2023-03-18T22:02:41.000Z

secret - I dig topic of upscale by a month so answer is not that simple as "click here that and there" (sampler, model - those things have gigantic impact on results and then you tinker with cfg and denoise )

3090/4090 best works at batch size 9 (learned that from dreambooth trainings), but indeed 8 divide nicely – and in my case 16, because I can and its 5% faster. Personally, i goes with following settings.

Answer 52 · 2023-03-19T13:49:12.000Z

secret - I dig topic of upscale by a month so answer is not that simple as "click here that and there" (sampler, model - those things have gigantic impact on results and then you tinker with cfg and denoise )

3090/4090 best works at batch size 9 (learned that from dreambooth trainings), but indeed 8 divide nicely – and in my case 16, because I can and its 5% faster. Personally, i goes with following settings.

thank you!

Answer 53 · 2023-03-19T19:30:18.000Z

Maybe this is common knowledge but I don't see it mentioned here. Automatic1111 allows you to change the initial noise value. At 0.5 it will barely change the initial image which allows for higher denoise values without substantially changing the image. At 1.5 it will introduce a lot of noise even at low denoise values and requires a high CFG but will add more texture and detail to the resulting image.

Answer 54 · 2023-03-22T20:07:54.000Z

10 minutes is absolutely crazy. There must be incorrect params. What's your params?
For example, when you set overlap = 48 and tile size = 96, tile batch size=8, it will consume 2m38s in total.

Getting back to test this more on my end, but from that screenshot alone, I'm already seeing a difference in what my console outputs during the upscaling process - I get multiple progress bars when performing any img2img, maybe there are some settings affecting how Multidiffusion behaves?

Additionally, there's a huge difference too if denoising steps are set to be the same as sampling, which I had on in my previous tests.

Currently, with the latest commit (117bbf1a) I can upscale 512x512 @ 4x with these settings in about 1m30s now:

Answer 55 · 2023-03-22T20:10:30.000Z

10 minutes is absolutely crazy. There must be incorrect params. What's your params?
For example, when you set overlap = 48 and tile size = 96, tile batch size=8, it will consume 2m38s in total.

Getting back to test this more on my end, but from that screenshot alone, I'm already seeing a difference in what my console outputs during the upscaling process - I get multiple progress bars when performing any img2img, maybe there are some settings affecting how Multidiffusion behaves?

Additionally, there's a huge difference too if denoising steps are set to be the same as sampling, which I had on in my previous tests.

Currently, with the latest commit (117bbf1a) I can upscale 512x512 @ 4x with these settings in about 1m30s now:

of course changing the denoising steps to not be the same as sampling makes a huge difference, you're running less steps total.

Answer 56 · 2023-03-22T20:13:53.000Z

of course changing the denoising steps to not be the same as sampling makes a huge difference, you're running less steps total.

Are you saying that I should be getting those posted times even with this enabled? (I believe it's off by default) That's what I wanted to confirm if it should be on or not for the times I get.

Answer 57 · 2023-03-22T20:15:49.000Z

of course changing the denoising steps to not be the same as sampling makes a huge difference, you're running less steps total.

Are you saying that I should be getting those posted times even with this enabled? (I believe it's off by default) That's what I wanted to confirm if it should be on or not for the times I get.

Pretty sure, yeah.

It's not a good comparison if you toggle that off, because then you're running a different number of steps with and without multidiffusion.

Your biggest limitation seems to be that you're only running a batch size of 1; setting that to 8 should speed things up drastically (as it'll do 8 small tiles at once rather than sequentially).

Answer 58 · 2023-03-22T20:32:08.000Z

Your biggest limitation seems to be that you're only running a batch size of 1; setting that to 8 should speed things up drastically (as it'll do 8 small tiles at once rather than sequentially).

Thanks for catching that, 1m30s is for batch size of 8, I probably screencapped before adjusting the settings.

@pkuliyi2015 This looks to have resolved my performance issues, the denoising steps in img2img were resulting in higher than normal times, once I compensate for that setting (e.g. multiplying the 20 steps by 0.3 to get ~6-7 steps with the setting enabled), yields ~1 minute to render a 4K output.

Answer 59 · 2023-03-25T05:57:48.000Z

@pkuliyi2015 do you see any use for the new sd2.1 unclip model? if somehow implemented there might be no need for regional prompts..

Answer 60 · 2023-03-25T06:36:09.000Z

@pkuliyi2015 do you see any use for the new sd2.1 unclip model? if somehow implemented there might be no need for regional prompts..

I knew that and tried their demo when they just published. But in fact, it is far from my satisfaction.

By the way, I have completed the major update on regional prompt control. Currently, it may be the most competitive extension for training-free compositional large image drawing.

Answer 61 · 2023-03-25T19:54:19.000Z

secret - I dig topic of upscale by a month so answer is not that simple as "click here that and there" (sampler, model - those things have gigantic impact on results and then you tinker with cfg and denoise )

3090/4090 best works at batch size 9 (learned that from dreambooth trainings), but indeed 8 divide nicely – and in my case 16, because I can and its 5% faster. Personally, i goes with following settings.

Does Tiled VAE need to be enabled when using multidiffusion? That was not clear to me.

Answer 62 · 2023-03-25T20:02:33.000Z

Does Tiled VAE need to be enabled when using multidiffusion? That was not clear to me.

Tiled VAE is not a must, it just saves your VRAM when you want to get larger images. It won't affect your outcomes and is compatible with almost everything (like Highres).

Answer 63 · 2023-03-26T00:24:08.000Z

Hello everyone,
I'm fairly new to this whole topic and I was trying to replicate some of your settings in A1111.
I'd like to achieve the detailed upscale looks with ultrasharp but the images just get regularly upscaled like esrgan, no details at all and sometimes even worse,
I only have 4GB Vram, so I can max. upscale to 1024x1024 otherwise the system tells me that I don't have enough memory.

I don't know if there's anything in the settings that I must change.
I have the regular UI, no Multidiffusion or anything like that.

I'd want to just upscale an image that I made with Midjourney, to bring back some details, especially in the face area.
But as I said, img2img or extras didn't do anything for me but just upscale it without any details,

Answer 64 · 2023-04-03T17:42:44.000Z

I made a few comparison with Ultimate upscaler (default settings, CFG 10, DDIM, denoise 0.23 ) and mixture of diffuser.
The original image

vs Denoise 0.23, DDIM:
https://imgsli.com/MTY2Njkw

vs Denoise 0.35 DDIM
https://imgsli.com/MTY2Njkx

vs Denoise 0.35 Euler A - cfg 14
https://imgsli.com/MTY2Njky

MD is good at adding extra details, without overcook image, you can go with high denoise and cfg but as far upscaling go, Ultimate SD upscaler still has less pixelated texture when you zoom in, especially the hand and face.
Parameter for MD:
Tiled Diffusion upscaler: 4x-UltraSharp, Tiled Diffusion scale factor: 2, Tiled Diffusion: "{'Method': 'Mixture of Diffusers', 'Latent tile width': 64, 'Latent tile height': 64, 'Overlap': 48, 'Tile batch size': 4, 'Upscaler': '4x-UltraSharp', 'Scale factor': 2, 'Keep input size': True}"
Got way bad result with recommended settings. Maybe I'm doing wrong

I inpainted a bit before upscaling, here is the actual original image if anyone want to try out:
https://files.catbox.moe/wek7ed.png

Answer 65 · 2023-04-06T02:38:23.000Z

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant:

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon):

Answer 66 · 2023-04-06T17:02:42.000Z

Hm, I've been using the region prompt, and I've noticed that if anything, it seems even worse about concatenating random people on the boundaries--even if there's nothing in the main prompt about people.

This sort of thing is nearly constant:

In many regards, it's performing even worse than just straight generating a 1024x1024 image. 20 images in a batch, and only one didn't have extra people (instead, it duplicated the entire horizon):

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them.
The greater the overlap the more context each tile has from its surrounding tiles.

Answer 67 · 2023-04-06T17:04:59.000Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Answer 68 · 2023-04-06T17:30:34.000Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

Answer 69 · 2023-04-06T17:35:47.000Z

Because most models are 512x512 this is more likely to occur the larger the images you try to create. I would first check that your batch size in multidiffusion is 1 with . Increasing Tile size and/or overlap may decrease the likelihood of this occurring. Try 80x80 with an overlap of 16 and latent tile batch size of 8, or 96x96 with an overlap of 32, or even 128x128 with an overlap of 64 and batch size of 4. If that does not fix it than create the images at 768x768 or even 512x512 then upscale them. The greater the overlap the more context each tile has from its surrounding tiles.

The point is, the region control is supposed to be used to help avoid such things while composing larger images. This is literally the point of it, and what the demonstration pictures were showing.

Have you tried what I suggested what I suggested?

You realise your suggestions are totally irrelevant, right?

Like, the point of region prompting is that you can have a larger image (with a background prompt using the usual MD merging), and then a specific foreground region (or regions) that are meant to contain specific things. It's even spelled out on the main page, including that tile size doesn't really matter for this one.

Creating an image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body part concatenation.

(I think it might actually be that some quality-related things tend to act as catalysts for drawing people; I'm not sure and I'm going to keep poking away)

Answer 70 · 2023-04-06T17:47:30.000Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

Answer 71 · 2023-04-06T17:49:57.000Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles:

Answer 72 · 2023-04-06T17:52:23.000Z

image at 512 or 768 and upscaling also completely defeats the point, which is that your standard SD-sized generation would only be a component of an image with a different aspect ratio, and not full of body pa

Image height is 1024. 8 tiles are in a batch so a height of 128 prevents the problem you are having. You can see my settings in the screenshot.

It doesn't, actually, because 128x128 tiles were what I was using when I was testing, and the concatenation kept happening. I'm pretty sure, after some more tests, that random tokens were actually prompting for people (for some unspeakable reason). Getting to this sort of thing consistently was a matter of changing the prompt settings, not messing with the tiles:

If you check powershell or command line it shows that at 128x128 Multi Diffusion does not take effect because the image is too small. You have to use a tile width smaller than 128.

Answer 73 · 2023-04-06T17:55:14.000Z

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

Answer 74 · 2023-04-06T18:16:38.000Z

Looking at the command line, multidiffusion was doing its thing. Probably because of region control. Which, again, to reference the MAIN PAGE FOR THIS REPO, says (with regard to region prompt)

The tile size parameters become useless; just ignore them

seriously, do you think the person maintaining this knows less about how it works than you do?

He is probably referring to it in the context of img2img, not txt2img.
And yes it possible to know more about how to use a tool than the person that made it. Musicians are better at their instruments than the people that made them.

Answer 75 · 2023-04-06T18:17:39.000Z

That's like saying a guitar player knows more about how an amplifier works.

Answer 76 · 2023-04-06T18:19:26.000Z

That's like saying a guitar player knows more about how an amplifier works.

No. It's like the thing I said. You can't just come up with a different analogy to discredit my first one.

Answer 77 · 2023-04-06T18:25:23.000Z

That's like saying a guitar player knows more about how an amplifier works.

In fact that is the exact opposite of my original analogy which is that the artist can utilize the tool better than the creator. It does not imply that the artist has the ability to design or create the tool.

Answer 78 · 2023-04-06T18:28:22.000Z

Your analogy was flawed, because I said how it works. The creator of something is more likely to know whether a certain setting actually does anything for a given setting than a user, even if the user is extremely good at it.

Anyway, I did more testing. It was the prompt causing humans to be generated where they really shouldn't be (like the entire half of an image that was only supposed to be scenery) and concatenating things when adjacent. Seriously, it was doing things like this:

or this:

When there was supposed to be only scenery to either side (and obviously nothing was describing those particular people). As I noted, it seems that a lot of tags that describe image quality are actually tied really strongly to generating people.

Answer 79 · 2023-04-07T03:50:45.000Z

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

Answer 80 · 2023-04-07T05:31:09.000Z

Thank you for making attempts on this. This is a classical noise pollution problem where the foreground noises triggered the undesirable multi-character change in the background, when your model is not that good for high resolution image generation.

This can be partly mitigate by adding some negative prompts in the background regions. However, this may not solve the problem totally. I am considering a much more powerful merging strategy and corresponding ui that lets you fuses images better.

you will definitely like it.

It wasn't too bad once there were no triggering tags in the general prompt (only 5 or 6 out of 100), and I got this out of it all with region control and the noise inversion:

But anything that would make for better image composition is great (only about 9 of the 100 had reasonable background coherency).

Answer 81 · 2023-04-08T16:15:01.000Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

Answer 82 · 2023-04-08T16:42:48.000Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.

The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.

I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

Answer 83 · 2023-04-09T02:57:42.000Z

Hello, I am trying to use Multidifusion to place kemono characters in the background, but the checkpoint I am using requires Hires fix and hypernet to be enabled by default, otherwise it will generate humans.
The overall prompt words only describe the camera and background, as well as enabling hypernet. Enter character prompts for the foreground, and no prompts for the background. The first few steps of the denoising process can generate kemono normally, but in the end, the Hires fix transforms the character into a human. I tried to reduce the denoising value of the Hires fix, but it will result in fewer and more blurry image details. Increasing the denoising will make the character more like a human.
I don't know if this situation is due to the incompatibility between Hires fix and Multidifusion or if hypernet did not start properly.

I make a trial fix. Please switch to the dev branch and have a test. If it works please tell me on time.

IT doesnt work well.The first image uses Multidifusion with Hires fix Denoising=0.7, while the second image does not use Multidifusion.
You can see that using Multidifusion generates completely different characters, and the third image is a screenshot of the denoising process.

I tried to turn off Hires fix when using Multidifusion in t2i and move the generated blurry image to i2i, but the background details did not increase. To be honest, it was only changed to high-definition, while Hires fix can add things that were not in the original image.

$01407-1454243371-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio$
$00035-3734885203-(masterpiece_1 3), (2D_1 0), (anime_1 0), (illustration_1 0), (sharp_1 2),_(hard light_1 0), (shadow_1 0),(reflection, refractio (1)$

I also tried the other three models proposed by the checkpoint author, neither of which requires Hypernet to be enabled. However, two of these models also encountered a problem with character image changes when opening both Hires fix and Multidifusion, while the other model was able to generate Kemono characters normally.
If you are interested, the model address is below.

https://civitai.com/models/11888?modelVersionId=32830

It has been verified that the model that can use Multidifusion normally is crossfemono2.0, while the models that cannot be used normally are G, G2, F, and D

Answer 84 · 2023-04-14T14:47:57.000Z

你好，我使用清明上河图配合controlnet生成超长图的时候它似乎没起作用，请问这是什么原因呢，是因为预处理器分辨率不够吗

Answer 85 · 2023-05-10T03:02:07.000Z

"RuntimeError: Invalid buffer size: 6.89 GB" How to solve it?

Answer 86 · 2023-05-10T03:10:56.000Z

Display 'min and input tensors must be of the same shape' with tiled vae

Answer 87 · 2023-05-12T10:31:30.000Z

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder

Answer 88 · 2023-05-12T19:01:09.000Z

4x-UltraSharp upscaler and put it in the ESRGAN folder, I didn't find the relevant folder

Folder can be found in ...\stable-diffusion-webui\models\ESRGAN

Answer 89 · 2023-06-03T16:00:49.000Z

Can the author show how to generate a realistic style of Qingming River painting through interface manipulation? This plugin will make it easier for me to understand tiling diffuser, area tips and drawing full canvas backgrounds. Thank you very much.

Answer 90 · 2023-06-04T13:41:01.000Z

As you can see, I don't know how the area prompt words and Draw full canvas background you mentioned apply to this painting.

Answer 91 · 2023-06-12T09:12:14.000Z

To those of you asking questions on a closed discussion, you need to take some lessons from an old master at the art of asking questions online.

Answer 92 · 2023-08-08T14:23:42.000Z

Is there a setting that works with Intel 16-inch high-end model with 16g of RAM and AMD Radeon Pro 5500M with 6g of vram?

And is there a distinction between Python and PyTorch versions that work? Currently, the desired image size cannot be created in Python 3.10.12 and PyTorch Nightly 2.1.0. If R-ESRGAN 4x+ scale exceeds 1.7 in 512 size, cmd will exit with an mps shortage error.
I followed the settings as described in the description, but it fails.