Evaluating and taking the open-source text-to-image community one step further.
In this research, I seek to build on top of the open-sourced image generation model, Stable Diffusion V2.1 to generate images more optimized for generating images with multiple subjects. I seek to understand why discrepancies in image quality between open and closed source image models exist, and use natural language processing techniques and grounding inputs to assist with improving the accuracy of image generation with a new Multi-Subject Render model, and finally evaluate these results holistically.
GLIGEN, which this research was based on, can be found at its project page https://gligen.github.io/