ipld/edelweiss

add type ByteString for backwards compat with IPLD schema

petar opened this issue · 5 comments

petar commented

Edelweiss type String definitionally holds valid Unicode strings only, and encodes/decodes them from IPLD strings with valid UTF8 encodings.

However, there may be pre-existing IPLD schemas that place non-UTF8 byte sequences in IPLD string objects.
By design requirement, Edelweiss must provide a way for working with pre-existing schemas.

One way of doing this without violating Edelweiss's type semantics is to introduce a new Edelweiss type, say called ByteString, which:

  • is a list of bytes on the user-facing end
  • encodes/decodes as an IPLD string of arbitrary bytes on the wire
vmx commented

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

petar commented

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

@vmx I've updated the description. Maybe it was a bit confusing previously.

by the way, there is a new set of slides that documents Edelweiss at its current state (Milestone 1): https://github.com/ipld/edelweiss/tree/main/doc/slides
this may be helpful too.

vmx commented
  • encodes/decodes as an IPLD string of arbitrary bytes on the wire

I guess one major serialization will be CBOR. Then this won't work. In CBOR strings need to be valid UTF-8, else it's invalid, non-spec compliant CBOR.

petar commented

If this is the case, this suggests a design bug in the IPLD data model: if the IPLD data model allows arbitrary bytes in a string (which I believe it does), then this breaks the contract that IPLD values can be serialized to any backend (e.g. both DAGJSON and DAGCBOR).

vmx commented

The IPLD data model is independent of the serialization, so potentially there could be serializations that support that, we currently just don't have any of those serialization formats.