Some questions about Figure 3 and Figure 7

Question

Some questions about Figure 3 and Figure 7

Opened this issue a year ago · 1 comments

Hi, thanks for your fascinating work!

I just skimmed the paper today, and I have some questions.

About "Text Label Rendering" module:
As far as I understand, this is directly from Python Pillow packages (per "...we explicitly render clear text labels on the diagrams following diagram plan with the Pillow Python package.").
Then, (1.1) do I need to tune the font-size, font-color, etc. for each images?;
and (1.2) Then I can also directly use original GLIGEN with this Text Label Rendering module right?
About Figure

It's quite impressive that DiagrammerGPT can render all the objects' locations correctly! But I have a few questions.
(2.1) How can you get DALLE-3 results?
(2.2) Last rows, DiagrammerGPT seems to misunderstand "tulip", "daisy", and "sunflower". Do you think that is it the limitation of GLIGEN, or it's a limitation of DiagrammerGPT?

Looking forward for your answer, as I do want to understand the paper better! Thank you!

Answer 1 · 2023-10-29T17:42:53.000Z

@thaoshibe

I got the impression that you just use pillow package to just render text on top of generated image. Don't think it's possible to include pillow and fine-tune font styles afaik.
You'd need to fine-tune GLIGEN to yield images with better quality. GLIGEN uses SD1.4 and DiagrammerGPT also seems to have fine-tuned on top of 1.4 base model.