CoplayDev/unity-mcp

LLM Execution Optimization and Benchmark Discussion

Opened this issue · 3 comments

The major pain point I am experiencing is that LLM does not have a spatial/contextual understanding of certain commands. This happens when I ask it to generate a populated city, a Dust 2 map prototype, or a garden. User prompt is meant to be simple, so I think we need to add Metaprompt and guidance for LLM, especially on Unity execution, to make it smarter.

What I envision right now is a simple GUI input/button to start, so the user can describe their intent for the further prompt / LLM can use ManageScene to gather the current JSON data and try to have an understanding of what to do. It can go more complex than that, but a starting point with LLM knowing the explicit general answer would be good.

Also, hard to find a benchmark for this kind of testing scenario! We should think of possible testing cases and generation cases to test out our tool in the future.

This can be a very long discussion in terms of optimizing LLM execution; maybe Coplay has experience tackling this issue, but I am very open to the discussion.

Agreed it's a very smart idea to have a "test driven development" kind of approach, where we pick a few scenarios like you said and work on the system till it can handle them.

  • Garden with simple plant shapes is a great one.
  • Simple city is a good one.
  • Simple start menu is a good one (lots of people asking about UI)
  • 2D Galaga clone would be a more sophisticated one.

Any others?

Unity's included projects are good testing grounds as well!

It has been a while, but I want to pick up this subject again as I am working on batch & vision input for the MCP. Is there a way we can together build a benchmark prompt that could test the MCP's robustness, with each function being tested with 2~3 tasks?