aszala/DiagrammerGPT

Some questions about Figure 3 and Figure 7

Opened this issue · 1 comments

Hi, thanks for your fascinating work!

I just skimmed the paper today, and I have some questions.

  1. About "Text Label Rendering" module:
    As far as I understand, this is directly from Python Pillow packages (per "...we explicitly render clear text labels on the diagrams following diagram plan with the Pillow Python package.").
    Then, (1.1) do I need to tune the font-size, font-color, etc. for each images?;
    and (1.2) Then I can also directly use original GLIGEN with this Text Label Rendering module right?
    Screen Shot 2023-10-19 at 9 17 33 AM

  2. About Figure

It's quite impressive that DiagrammerGPT can render all the objects' locations correctly! But I have a few questions.
(2.1) How can you get DALLE-3 results?
(2.2) Last rows, DiagrammerGPT seems to misunderstand "tulip", "daisy", and "sunflower". Do you think that is it the limitation of GLIGEN, or it's a limitation of DiagrammerGPT?

Screen Shot 2023-10-19 at 9 18 15 AM

Looking forward for your answer, as I do want to understand the paper better! Thank you!

@thaoshibe

  1. I got the impression that you just use pillow package to just render text on top of generated image. Don't think it's possible to include pillow and fine-tune font styles afaik.
  2. You'd need to fine-tune GLIGEN to yield images with better quality. GLIGEN uses SD1.4 and DiagrammerGPT also seems to have fine-tuned on top of 1.4 base model.