add type ByteString for backwards compat with IPLD schema

Question

add type ByteString for backwards compat with IPLD schema

petar opened this issue 3 years ago · 5 comments

Edelweiss type String definitionally holds valid Unicode strings only, and encodes/decodes them from IPLD strings with valid UTF8 encodings.

However, there may be pre-existing IPLD schemas that place non-UTF8 byte sequences in IPLD string objects.
By design requirement, Edelweiss must provide a way for working with pre-existing schemas.

One way of doing this without violating Edelweiss's type semantics is to introduce a new Edelweiss type, say called ByteString, which:

is a list of bytes on the user-facing end
encodes/decodes as an IPLD string of arbitrary bytes on the wire

Answer 1 · 2022-03-11T15:48:59.000Z

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

Answer 2 · 2022-03-21T14:37:52.000Z

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

@vmx I've updated the description. Maybe it was a bit confusing previously.

by the way, there is a new set of slides that documents Edelweiss at its current state (Milestone 1): https://github.com/ipld/edelweiss/tree/main/doc/slides
this may be helpful too.

Answer 3 · 2022-03-21T16:59:59.000Z

encodes/decodes as an IPLD string of arbitrary bytes on the wire

I guess one major serialization will be CBOR. Then this won't work. In CBOR strings need to be valid UTF-8, else it's invalid, non-spec compliant CBOR.

Answer 4 · 2022-03-21T17:13:42.000Z

If this is the case, this suggests a design bug in the IPLD data model: if the IPLD data model allows arbitrary bytes in a string (which I believe it does), then this breaks the contract that IPLD values can be serialized to any backend (e.g. both DAGJSON and DAGCBOR).

Answer 5 · 2022-03-21T17:29:06.000Z

The IPLD data model is independent of the serialization, so potentially there could be serializations that support that, we currently just don't have any of those serialization formats.