Digest is based on a partially ordered model schema blocking reproducibility

Question

Digest is based on a partially ordered model schema blocking reproducibility

Opened this issue 2 months ago · 3 comments

Prerequisites

I checked the documentation and found no answer to my problem
I checked the existing issues and made sure there are no similar bug reports

Expected Behavior

Digests should be based on completely sorted json. Completely sorted means recursively sorting all keys, arrays, and nested objects. It seems that nested objects are sorted by key but, arrays are not.

Observed Behavior

Digest is based on a json representation of the open api model schema. However, the json is only ordered by keys.
For example given a model

class SuperImportantCheck(Model):
    """Plus random docstring"""
    check: bool
    message: str
    counter: int

when Model.build_schema_digest() is called it runs schema = model.schema_json(indent=None, sort_keys=True)
for the given SuperImportantCheck Model, schema will equal

{"description": "Plus random docstring", "properties": {"check": {"title": "Check", "type": "boolean"}, "counter": {"title": "Counter", "type": "integer"}, "message": {"title": "Message", "type": "string"}}, "required": ["check", "message", "counter"], "title": "SuperImportantCheck", "type": "object"}

Notably, schema.required maps to an array with the unordered parameters ["check", "message", "counter"].

The unordered nature of the schema json means other agents hoping to interact with an agent must implement a Model with the same ordering such that the digests match.

To Reproduce

Modify uAgents/python/src/model.py to contain

 @staticmethod
    def build_schema_digest(model: Union["Model", Type["Model"]]) -> str:
        schema = model.schema_json(indent=None, sort_keys=True)
        print(schema)
        digest = hashlib.sha256(schema.encode("utf8")).digest().hex()

        return f"model:{digest}"

Then run python python/tests/test_model.py

Version

v0.17.0

Environment Details (Optional)

No response

Failure Logs (Optional)

No response

Additional Information (Optional)

No response

Answer 1 · 2024-10-25T07:52:26.000Z

Thank you for opening this issue.

As you may have seen in the branches section or PR #374 just to name an example, we are very much aware of the topic.
Unfortunately we've been holding off of changing the way the digest is created for now because of backwards compatibility. Introducing a new way of calculating the digest will ultimately break compatibility for many existing agents so we need to be careful when rolling out such a change.

To minimise the amount of breaking changes we're working hard in-house to come up with a better solution for manifests in general which in turn will have an impact on how the digest will be created.

Coming back to your observation and what you've written under "expected behaviour":

Digests should be based on completely sorted json.

That is an assumption caused probably by the nature of our documentation. I'm convinced that there is no point in the docs where we state that Models used for agent to agent communication can differ in structure/order. It was always meant to be this way to allow for some flexibility but at the same time you'll never see examples or integrations where two agents that want to communicate don't share the exact same model definition (including order).

So for me that is an implicit definition and I want to apologise for being unclear in that regard.
We'll do our best to come up with a new way of treating the manifest in the near future and hope this will not block your progress or project in any way.

Answer 2 · 2024-10-25T07:52:46.000Z

Leaving this issue open as a reminder

Answer 3 · 2024-10-26T05:24:01.000Z

I had the same concerns regarding backward compatibility. I ran into this issue while implementing digest in an NPM package. To ensure that our agents are compatible with existing agents, we're implementing a digest that should be identical to the Python digest. I would like your opinion on our implementation of digest. We can continue the discussion at #539. Current progress on our digest implementation can be found at https://github.com/Luceium/uAgents-NPM/blob/digest/src/model.ts#L74.

Prerequisites

Category

Expected Behavior

Observed Behavior

To Reproduce

Version

Environment Details (Optional)

Failure Logs (Optional)

Additional Information (Optional)