w3c/activitystreams

Force non-functional properties to always be arrays

trwnh opened this issue ยท 17 comments

trwnh commented

Please Indicate One:

  • Editorial
  • Question
  • Feedback
  • Blocking Issue
  • Non-Blocking Issue

Please Describe the Issue:


Proposed change

Add "@container": "@set" to non-functional properties in order to force them to be arrays even after compaction.

Motivation

This would save plain JSON consumers from having to account for single-value arrays being expressed as single string values, forcing an array during compaction. Consumers would then only have to account for IRI representations vs inlined representations.

It would also make it clearer which properties are functional and which ones are not. Currently, many implementations wrongly assume that certain properties are functional and can only have a single value, when in reality they are actually sets that only often have a single value inside them.

In particular, items vs orderedItems has recently come up as a source of confusion. orderedItems is defined as a @container: @list and therefore is always an array. But items is not defined explicitly as a @container: @set, which makes it valid to use a single value instead of a single-value array. Consequently, GoToSocial (relying on Go-Fed) fell into the mistake of assuming that orderedItems could likewise be a single value, because items can be a single value.

Similarly commonly, several implementations fail to account for multiple type values. This leads to situations where implementations include an extension type that somewhat overlaps with a core type, but then don't include the core type as they are normatively required to do in AS2-Core. The following language is repeated 3 times in AS2-Core sections about modeling:

When an implementation uses an extension type that overlaps with a core vocabulary type, the implementation MUST also specify the core vocabulary type.

I believe it would lead to greater conceptual consistency to do away with the ambiguity around single values vs single-value sets. The default JSON-LD behavior of coercing single-value sets into single values is not very useful and mainly serves as a pain point, as yet another case to handle for non-LD-aware processors.

This would be a breaking change and I believe out of scope for our current conception of ActivityStreams extensions or editorial clarifications

In particular, items vs orderedItems has recently come up as a source of confusion. orderedItems is defined as a @container: @list and therefore is always an array.

That said, this particular issue is a pretty small change and seems unlikely to break backwards compatibility for any major implementation, so maybe we could make a small change here to ease this particular pain point

evanp commented

I can see some reasons to include this breaking change. However, it would require quite a lot of effort on the part of implementers, specification writers, the W3C, etc.

It sounds like the main concern is implementations that can't handle multiple values in these properties. While I agree that making it easier for them to do so would be helpful, I'm not sure the effort saved on the part of these implementers outweighs the effort needed to make a breaking change.

Could we reach out instead to GoToSocial to check that they are publishing valid AS2 documents? And, in parallel, reach out to the implementations that don't support multiple types?

I don't think this is ready to close. I'm going to self-assign for further discussion.

evanp commented

It looks like GoToSocial fixed this bug recently:

superseriousbusiness/gotosocial#1673

evanp commented

Also, I feel pretty uncomfortable requiring every "type": "Note" to be "type": ["Note"]. That seems like an extreme fix for a conceptual problem; it may be hard to get implementers to comply; and it makes hundreds of millions of activity objects invalid.

trwnh commented

If it's "uncomfortable" to forcefully disambiguate this, then perhaps it might be "enough" to add a very prominent note or callout in the spec docs then? Something like the warning about Public and as:Public and https://www.w3.org/ns/activitystreams#Public all being equivalent. It should warn that almost every property can be either a single value, a single reference, a hash, an array of values, an array of references, an array of hashes, or a mixed array of any combination of those three things.

But... that's a lot of possible representations, isn't it? So maybe it's not so extreme to get rid of several of those... especially where it leads to fragility by implementers making incorrect assumptions. If all of those are "valid", then that's a lot of complexity that "plain JSON" software has to deal with. I think a breaking change to remove extreme complexity isn't so extreme itself.

In my opinion, either a lot of outreach needs to be done to get non-LD implementers to support arrays in all cases where they are valid... or we move to make lists and sets be unambiguous. I'm not very sure in the long run it will be that big of a break. However, I am more sure that the outreach needed will only continue to grow and grow as more implementations are written and written poorly.

evanp commented

We created Primer pages on the terms and complex properties that implementers will see, along with non-normative guidance for publishers and consumers.

trwnh commented

In particular, items vs orderedItems has recently come up as a source of confusion. orderedItems is defined as a @container: @list and therefore is always an array.

That said, this particular issue is a pretty small change and seems unlikely to break backwards compatibility for any major implementation, so maybe we could make a small change here to ease this particular pain point

ah, i missed this bit from @nightpool -- are you suggesting items gets explicitly defined as @container: @set to make it always an array, just like orderedItems is always an array? because i suppose that is the one that is probably most sensible to make

this, combined with the prominent warning i mentioned above as "enough" might be enough to consider the issue "resolved"

are you suggesting items gets explicitly defined as @container: @set to make it always an array, just like orderedItems is always an array?

yep ๐Ÿ‘

I assume the whole point of the requirement of compacted form is to avoid requiring consumers to understand the whole JSON-LD concepts, but I think the requirement of stripping/wrapping the array is instead imposing a JSON-LD trivia both on producers and on consumers with no particular practical values, besides saving a couple of bytes each in uncompressed data.

While changing the normative context is technically a breaking change, some of the non-functional terms, including to, cc and tag (in addition to items, which is pointed out in OP), are unconditionally serialized as an array by some major implementations out there, and I believe adjusting the definitions of such terms would make the spec consistent with the de facto standard behavior rather than breaking existing implementations.

Although the change would require existing conforming producing implementations to change the behavior, I expect that as for these terms, the array form is compatible with more number of consumers, and that the change would benefit those implementations in practice. The most painful point of the current definition IMO is that simply applying the compaction algorithm to be conformant with the spec would actually result in a less compatible implementation, which cannot be fully resolved by informative guidances. What is worse, the status quo is penalizing conforming producers rather than non-conforming ones, but I digress.

To give another instance, a lot of the example Activity Streams documents in the related Recommendations are written in a way as if the terms were defined as "@container": "@set". A manual skim told that the instances include Example 6, 20 in Activity Streams 2.0, Example 66, 67, 70, 71, 72, 104, 105, 108 in Activity Vocabulary, Example 2, 3, 4, 5, 8, 15, 16 in ActivityPub.

While these might deserve errata, declaring that this large number of errors are just "errata" doesn't feel quite appropriate to me. I would rather argue that the fact that even the spec authors didn't handle it properly to this extent is a reflection of a misdesign in the normative context.

Anyway, these examples, combined with the behavior of real-world implementations, provide us with a set of good candidate terms for the change; namely, I suppose it would be safe to change the terms used for audience targeting (to, bto, cc and bcc (and perhaps audience too?)), items and tag.

Well, sorry for changing the opinion over and over, but I've got a concern about changing the normative context document.

Namely, if there's an implementation that relies on the compaction algorithm in a sort of canonicalization process for e.g. hashing objects, the change would severely break that implementation.

I don't know a real-world example of such an implementation: AFAICT RsaSignature2017 used something like URDNA2015 (according Mastodon's source code. I couldn't find https://w3c-dvcg.github.io/lds-rsa2017/ even in the CG's GitHub organization nor Wayback Machine) whose result is serialization-agnostic and FEP-8b32 (Object Integrity Proofs) doesn't apply compaction even though it primarily uses JCS. But I think it's still worth considering.

If the normative context is modified in this way, will that possibly produce empty arrays in compacted documents? If so, how is that related to the mandatory AS2 requirement that:

If a property has an array value, the absence of any items in that array must be represented by omitting the property entirely or by setting the value to null. The appropriate interpretation of an omitted or explicitly null value is that no value has been assigned as opposed to the view that the given value is empty or nil. -- Activity Streams 2.0, Section 2. Serialization (emphasis in the original document).

If the normative context is modified in this way, will that possibly produce empty arrays in compacted documents?

The proposed context can can produce an empty array when compacting an empty array, but so does the current context. And I don't think there would be a new case of compaction to an empty array introduced by the change.

Consider the following serialization of graphs:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "@graph": [
    // Multiple values
    { "items": ["https://example.com/objects/1", "https://example.com/objects/2"] },
    // Single value
    { "items": "https://example.com/object" },
    // Empty array
    { "items": [] },
    // null
    { "items": null },
    // Not specified at all
    {}
  ]
}

With the current context, they compact to the following:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "@graph": [
    {
      "items": [
        "https://example.com/objects/1",
        "https://example.com/objects/2"
      ]
    },
    {
      "items": "https://example.com/object"
    },
    {
      "items": []
    }
  ]
}

(JSON-LD Playground)

And with the proposed change, the graphs compact to the following:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "items": {
        "@id": "as:items",
        "@container": "@set"
      }
    }
  ],
  "@graph": [
    {
      "items": [
        "https://example.com/objects/1",
        "https://example.com/objects/2"
      ]
    },
    {
      "items": [
        "https://example.com/object"
      ]
    },
    {
      "items": []
    }
  ]
}

(JSON-LD Playground)

The proposed context can can produce an empty array when compacting an empty array, but so does the current context.

I think we need some clarification about the non-empty array requirement. I don't know the rationale for it or if it's a requirement that is intended to further constrain JSON-LD compacted output.

@tesaguri and @steve-bate I think you drifted into some other topics. Would you mind adding new issues for the topics you raised? I think they are important but probably don't belong on this somewhat narrow issue.

I've marked this issue as "Next version" because it is a normative change that would require a change in the AS2 docs. I'm going to close it for now, but we'll use closed next-version issues as input for the WG process.

While going through old issues for something else I encountered this by @erincandescent from 2014: #24 (comment)

we have to remember that people will end up implementing things, accidentally or through laziness which assume only a single value for a given property. Therefore, we should express care when adding such properties.

...which I think clearly summarizes the contention here.