w3c/activitystreams

owl file vs jsonld context

sandhawke opened this issue ยท 25 comments

Please Indicate One:

  • Editorial
  • Question
  • Feedback
  • Blocking Issue
  • Non-Blocking Issue

Please Describe the Issue:

In the drafts, Core linked to ./activitystreams2-context.jsonld and Vocabulary linked to ./activitystreams2.owl

It seems to me both of those documents need to share https://www.w3.org/ns/activitystreams, if we're going to follow Linked Data principles.

  1. Anyone want to convert https://github.com/w3c/activitystreams/blob/master/vocabulary/activitystreams2.owl to JSON-LD and syntactically merge it into the context file? I'm not sure where the master of the context lives right now.

  2. Will that cause any problems? The context file will be bigger. I suppose we could use owl:import, although that might not work with as many tools.

๐Ÿ‘

@akuckartz thoughts on owl:import? Do you know a tool that would convert the ontology to reasonable json-ld?

I was confused by only getting the JSON-LD context when looking up https://www.w3.org/ns/activitystreams ($ curl -LH "accept:application/ld+json" https://www.w3.org/ns/activitystreams) as I expected an RDFS or OWL file. When I did not get RDFS/OWL I concluded that no such representation of the vocab exists.

Then I read in the AS Vocabulary definition:

A non-normative turtle definition of the Activity Streams 2.0 vocabulary is provided here and/or at the namespace

Great to see that the RDFS/OWL version exists but unfortunately the spec incorrectly says that you can fetch it from the namespace.

How to best publish a vocabulary alongside its context is a general question. With regard to a answer, one could also do this:

  1. Provide the OWL file via the namespace
  2. Link to the context via a link header

2.) is also done by schema.org:

$ curl -I https://schema.org
HTTP/2 200 
access-control-allow-credentials: true
access-control-allow-headers: Accept
access-control-allow-methods: GET
access-control-allow-origin: *
access-control-expose-headers: Link
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json
...

However, ths approach probably has some drawbacks and will probably break applications that get the context directly from https://www.w3.org/ns/activitystreams.

The approach outlined in this issue seems to be the better option. If I understand it correctly, you want to publish under the namespace not only the JSON-LD context but in addition the whole RDFS/OWL vocabulary as JSON-LD. I quickly put one version of this together with the help of https://rdf-translator.appspot.com & https://json-ld.org/playground/: https://gist.github.com/acka47/f78b0a7803e0c5ffdac171551f6fe638

Is this the thing you imagine (probably without the bnode IDs and some nested objects instead)?

Is this the thing you imagine (probably without the bnode IDs and some nested objects instead)?

That's great!

Another approach by other W3C namespaces is to use HTTP content-type negotiation to either return the JSON-LD context or the OWL file. For exampl:

curl -LH "accept:application/ld+json" https://www.w3.org/ns/ldp

Returns a JSON-LD context. Whereas

curl -LH "accept:text/turtle" https://www.w3.org/ns/ldp

Returns the full vocabulary as Turtle.

It would be very nice to have this working as well for ActivityStreams. This would allow using the Vocabulary without doing significant amount of JSON-LD processing.

Slightly off-topic (should probably be a new issue):

Great to see that the RDFS/OWL version exists but unfortunately the spec incorrectly says that you can fetch it from the namespace.

Unfortunately this is outdated. Somethings I think are outdated or missing:

  • the inbox property is missing (and some others)
  • uses http instead of https as base uri

the inbox property is missing (and some others)

The inbox property is taken from the Linked Data Platform vocabulary: https://www.w3.org/ns/ldp#inbox (see also $ curl -H "accept:application/ld+json" https://www.w3.org/ns/activitystreams | grep -A 2 inbox)

uses http instead of https as base uri

I think there already was some discussion about this but I can not find it.

evanp commented

I will work on ensuring that the JSON-LD context is the master document, and will automatically generate the OWL document if possible. I'll need to check with W3C contacts to see if we can set up the content negotiation.

the inbox property is missing (and some others)
uses http instead of https as base uri

The vocab URIs in the LDP ns uses the http: URI scheme. The current AS JSON-LD context is correct.


What Sandro was suggesting is that the vocab and the JSON-LD context are available from the same URI ( https://www.w3.org/ns/activitystreams ). This is practical and AFAIK, https://www.w3.org/ns/activitystreams is the base URI for terms that's used by RDF publishers and consumers.

However, I may be wrong but if the same URI is used for the vocab and the JSON-LD context, it may be a URI collision (see #504 (comment) )

Generating a Turtle towards the vocab from the JSON-LD context will not result in anything since the JSON-LD context does not currently include any definitions.

So, I'm a bit puzzled about what's the best course of action, and was hoping for guidance from W3C and everyone. It'd be great for me to be wrong about the collision.. and that the definitions can be rolled into the representation available from https://www.w3.org/ns/activitystreams and have conneg for application/ld+json, and text/turtle.

So, I'm a bit puzzled

I'm a little perplexed, however. Isn't it true that all ActivityStreams objects reference a vocabulary via the @context, and that currently, this vocabulary is absent?

have conneg for application/ld+json, and text/turtle

Regarding the implementation of content negotiation for application/ld+json and text/turtle, I have some reservations. We've seen numerous challenges over the years in implementing content negotiation correctly, including within Solid. Do you think it's feasible to execute this without issues within a reasonable timeframe?

I'm not aware of implementations having issues with conneg in Solid. If you can follow-up on that with links in the Solid CG, that'd be helpful.

AFAIK, the W3C server has been correctly responding to ns/ conneg requests for a long time now, e.g.:

$ curl -iLH'Accept: text/turtle' http://www.w3.org/ns/oa
$ curl -iLH'Accept: application/ld+json' http://www.w3.org/ns/oa

And in the case when the server can't provide a requested representation, it responds properly with 406, and suggest available variants, e.g.,:

$ curl -iLH'Accept: text/foobarbazqux' http://www.w3.org/ns/oa

HTTP/2 406
content-type: text/html; charset=iso-8859-1
...
<ul>
<li><a href="oa.html">oa.html</a> , type text/html</li>
<li><a href="oa.rdf">oa.rdf</a> , type application/rdf+xml</li>
<li><a href="oa.ttl">oa.ttl</a> , type text/turtle</li>
<li><a href="oa.jsonld">oa.jsonld</a> , type application/ld+json</li>
<li><a href="oa.json">oa.json</a> , type application/json</li>
</ul>

So, yes, W3C can set up conneg and it is not an issue.

As I see it, the core concern boils down to whether it is correct or meaningful to have a resource in which the JSON-LD representation includes the JSON-LD context and the vocab (information in OWL and/or RDFS).

If the above is okay, what Evan is trying to makes sense to me. AFAICT, W3C Team contact for SWICG is @pchampin -- (not) sorry to summon you =)

A JSON-LD document can be the source document including the JSON-LD context and the vocab (which includes the human-readable labels), where the other formats like Turtle and possibly HTML can be generated. If the vocab is not part of the source JSON-LD document, there is no point in generating Turtle. (Aside: RDF/XML is not needed.)

See also the HTML representation of ns/activitystreams which includes content about the JSON-LD context and the vocab (but it actually links to the spec):

$ curl -LH'Accept: text/html' https://www.w3.org/ns/activitystreams

But again, conneg is fine:

$ curl -LH'Accept: application/ld+json' https://www.w3.org/ns/activitystreams

It is the content of the representations that needs to be sort out, and whether there are URI collisions or not.

I'm not aware of implementations having issues with conneg ...

Surely all mime types should give back the same data (ie triples) in conneg, including HTML?

I hope you don't mind but I prefer to continue this conneg discussion elsewhere. I'll respond briefly and leave it at that from my end.

As per RFC 7231 (or RFC 9110), "sameness" (equivalent representations) is determined by the algorithm selecting the response, and surely the URI owner as part of social convention ensures the semantics of the resource is preserved (as intended) across representations over time. So, it does not strictly entail that "same" data/triples is literally required to be part of all representations. It is not always possible any way. As per RDF11-CONCEPT (with concrete RDF syntaxes), implementations can (and do) practically provide the same data within the framework of HTTP. There are social conventions, so we wouldn't normally expect or consider it correct for PNG and HTML representations to be the "same", although some may argue that. We'd expect PNG and JPEG to be equivalent. Similarly, the RDF graph in JSON-LD and Turtle may not be isomorphic, and that's okay. Same goes for Turtle and HTML with or without embedded RDF. All of those combinations for example can be considered to be equivalent representations.

Similarly, the RDF graph in JSON-LD and Turtle may not be isomorphic, and that's okay

If JSON-LD and Turtle are giving back different triples you have a BIG problem. Because the client wont know which of the serializations to take. Different serializations will give different results, inconsistent UX, security issues, wide attack surface etc.

(JSON-LD would for example allow multiple named graphs, and so that's not going to be encodable in Turtle.)

I still think this sub-thread is beyond the scope of this issue (and SWICG working on it). Most fruitful places to continue considerations on round tripping would be JSON-LD WG or RDF-DEV CG, and tangentially HTTP WG re "sameness".

Yes, indeed, it seems evident that the task of developing bug-free content negotiation can be challenging for the average web developer, illustrating the high complexity of the process.

I'm a little perplexed, however. Isn't it true that all ActivityStreams objects reference a vocabulary via the @context, and that currently, this vocabulary is absent?

no, there is no normative machine readable vocabulary for ActivityStreams, only a normative context file. The OWL file is informational only, and implementations are free to use or not use it as they wish.

For this reason I'm somewhat against putting the OWL file at the same URL as the normative context.

Yes, indeed, it seems evident that the task of developing bug-free content negotiation can be challenging for the average web developer, illustrating the high complexity of the process.

This is a thing you've said here and on the mailing list multiple times, but I still fail to see any evidence for this statement. Content negotiation may not be appropriate for this document in particular but I believe they're a useful tool in general and e.g. ActivityPub makes good use of them. We currently use context negotiation to serve a human-readable HTML and machine-readable JSON-LD version of the same context document, and I have yet to hear of a single bug reported to the CG due to that.

Yes but this is a bug in the mastodon implementation, because in the semantic web you must return the same data for each mime type

Luckily neither activitypub nor activitystreams is the "semantic web". activitypub does not mandate this, activitystreams does not mandate this, and I do not think it is a reasonable nor useful mandate.

Yes but this is a bug in the mastodon implementation, because in the semantic web you must return the same data for each mime type

Luckily neither activitypub nor activitystreams is the "semantic web". activitypub does not mandate this, activitystreams does not mandate this, and I do not think it is a reasonable nor useful mandate.

@nightpool You've successfully developed a functioning system, that's what matters. The choice of how much to adopt web standards to any degree is entirely up to you. The Roadmap for Mastodon is impressive, suggesting a wealth of innovation ahead.

In my view, it would be beneficial to ensure consistency in the data returned across different content types, and on a broader scale having consistent data helps others to interop, as well as offering features such as semantic interop, extensibility, scalability. You'll have to decide which parts of web standards are useful to you, and others will follow that, and/or build bridges.

Let's think about fixing this bug in the next ActivityStreams update, whether that's in the errata or a fresh version. With all the updates and changes in the W3C stack and linked data, we really want to avoid things breaking - from tools, to interoperability, and libraries.

Just consider the surge of millions of new users coming into the space. Upholding standards for smooth interaction is more important than ever. Big names like Meta have a history of playing by these standards, which just makes everything work better.

If there's an intentional move away from the standard, it'd be cool to know why and what the benefits might be. If there's no clear reason, I'd suggest we go ahead and squash this bug in the next version.

evanp commented

#529 (comment)

I think we need to get this resolved. I'm going to try to merge @csarven 's patch for this, and get the document moved on the W3C servers.

@evanp , your reasoning makes sense to me, but just want to say that I don't entirely feel comfortable about merging my PR 504 until there is a clear understanding / response to these concerns in particular:

  • #504 (comment) - I think this is important.
  • #416 (comment) ("is [it] correct or meaningful to have a resource in which the JSON-LD representation includes the [content of] JSON-LD context and the vocab (information in OWL and/or RDFS).")

Perhaps I've missed some responses that okays things.

It'd be great to have some guidance from @pchampin @plehegar - W3C team contacts of SWICG.

Finally got some time to dig into this issue and the related PR...

Regarding the "IRI collision": I would generally advise against using the same IRI for the vocabulary namespace and the context. First, because it reinforces the common misconception that the context describes the vocabulary in the same sense that an RDFS/OWL ontology describes it. Second, because it makes the context document bigger than necessary.
However, I don't consider this conflation to be inherently wrong (*), so since this ship has sailed, let's go along with it.

One way to comply with Linked Data best practices and keep the context document small enough would be to include a single triple for each term, of the form <#the-term> rdfs:isDefinedBy <activitystreams-owl>.

(*) I have a very broad notion of "sameness" ๐Ÿ˜‰

It'd be great to have some guidance from @pchampin @plehegar - W3C team contacts of SWICG.

If the guidance request is about how to propagate changes on W3C servers, a ping one of us should do the trick but it needs to have or point to a clear set of requested changes.

Note that we have a repo to help the maintenance of w3.org/ns.

Thanks @pchampin . I'd appreciate a deep/rough/mean review from you at #504 . If you / W3C finds that adequate and there are no objections, yes @plehegar , it'd be great to publish it on w3.org at your convenience. There are other changes in the pipeline for RDF/OWL which we'll eventually get out but the 2 comments that I referred to in #416 (comment) are quite fundamental (IMO) as to what representations are to be expected when the resource resolves. Again, I wanted to tread carefully here since I've raised it and have been dying to get more eyes on this matter.

@plehegar , on a related note to above, I initially wrote a long response here but then decided to summarise it for you at w3c/ns#2 . Would appreciate your guidance. Thanks!