Some questions about Figure 3 and Figure 7
Opened this issue · 1 comments
Hi, thanks for your fascinating work!
I just skimmed the paper today, and I have some questions.
-
About "Text Label Rendering" module:
As far as I understand, this is directly from Python Pillow packages (per "...we explicitly render clear text labels on the diagrams following diagram plan with the Pillow Python package.").
Then, (1.1) do I need to tune the font-size, font-color, etc. for each images?;
and (1.2) Then I can also directly use original GLIGEN with this Text Label Rendering module right?
-
About Figure
It's quite impressive that DiagrammerGPT can render all the objects' locations correctly! But I have a few questions.
(2.1) How can you get DALLE-3 results?
(2.2) Last rows, DiagrammerGPT seems to misunderstand "tulip", "daisy", and "sunflower". Do you think that is it the limitation of GLIGEN, or it's a limitation of DiagrammerGPT?
Looking forward for your answer, as I do want to understand the paper better! Thank you!
- I got the impression that you just use pillow package to just render text on top of generated image. Don't think it's possible to include pillow and fine-tune font styles afaik.
- You'd need to fine-tune GLIGEN to yield images with better quality. GLIGEN uses SD1.4 and DiagrammerGPT also seems to have fine-tuned on top of 1.4 base model.