YangLing0818/RPG-DiffusionMaster

Any updates on a ComfyUI solution?

camoody1 opened this issue · 5 comments

With the disclosure yesterday that ELLA for SDXL will not be released as open-source, everyone seems to be running around looking for our savior in this space. I'm just wondering if any advancement has been made towards creating a set of ComfyUI nodes for this that doesn't require 15GB+ of VRAM.

🙏🏼

Xyem commented

Just mentioning here as well, as the issue I commented on this is currently closed (#1), but I am looking into using my Oobabooga custom nodes to replicate how this works, potentially by tying them together with ImpactPack's regional samplers (though I intend to keep each step separate, so they can be used by other things like regional conditioning).

Personally, this works for me as I have multiple machines for running each aspect of the process, but I'm hoping this is a nice way for others too, because being able to load quantised models can reduce the VRAM requirements considerably, I've seen an extension for Oobabooga that causes it to load/unload models on API calls, and other services can be used (because though the nodes are for Oobabooga, they should work with any OpenAI API compatible).

I'll ping here when(if?) I have something at least somewhat working! :)

Xyem commented

I think I've managed to replicate the output of RPG-Diffusion using just clip nodes with the built-in KSampler (which, to my understanding, is how RPG-Diffusion works..). I wasn't getting good results with regional sampling originally but with this working, I think another attempt is in order!

Left set are generations using the original prompt, right set is LLM-aided clip attention. It's a replication of the A handsome young man with blonde curly hair and black suit with a black twintail girl in red cheongsam in the bar. example.

2024-05-08-005206_1185x517_scrot

@Xyem Yes! That's certainly much better! Please keep us updated on your progress.

great!How did you do it? Can you share your workflows? @Xyem