LLM Execution Optimization and Benchmark Discussion

Question

LLM Execution Optimization and Benchmark Discussion

Opened this issue 3 months ago · 3 comments

The major pain point I am experiencing is that LLM does not have a spatial/contextual understanding of certain commands. This happens when I ask it to generate a populated city, a Dust 2 map prototype, or a garden. User prompt is meant to be simple, so I think we need to add Metaprompt and guidance for LLM, especially on Unity execution, to make it smarter.

What I envision right now is a simple GUI input/button to start, so the user can describe their intent for the further prompt / LLM can use ManageScene to gather the current JSON data and try to have an understanding of what to do. It can go more complex than that, but a starting point with LLM knowing the explicit general answer would be good.

Also, hard to find a benchmark for this kind of testing scenario! We should think of possible testing cases and generation cases to test out our tool in the future.

This can be a very long discussion in terms of optimizing LLM execution; maybe Coplay has experience tackling this issue, but I am very open to the discussion.

Answer 1 · 2025-08-21T18:34:52.000Z

Agreed it's a very smart idea to have a "test driven development" kind of approach, where we pick a few scenarios like you said and work on the system till it can handle them.

Garden with simple plant shapes is a great one.
Simple city is a good one.
Simple start menu is a good one (lots of people asking about UI)
2D Galaga clone would be a more sophisticated one.

Any others?

Answer 2 · 2025-08-21T23:33:20.000Z

Unity's included projects are good testing grounds as well!

Answer 3 · 2025-10-23T21:54:46.000Z

It has been a while, but I want to pick up this subject again as I am working on batch & vision input for the MCP. Is there a way we can together build a benchmark prompt that could test the MCP's robustness, with each function being tested with 2~3 tasks?