w3c/activitypub

Standardize discovery using link rel on user-visible URLs

riking opened this issue · 34 comments

Problem

Given an arbitrary HTML document or URL on the Web, determine whether it shows the human-readable version of an ActivityPub object.

Given a URL that the user knows represents an ActivityPub object, interact with it (Like, replies.Add, Flag) from an existing account (an Actor on a different server).

Allow the user agent (read: "browser extension") to offer affordances such as an account selector when on AP-enabled pages.

Motivation: let people use their accounts elsewhere (Masto/Pleroma) to comment / like / flag a PeerTube video, which is typically a "drive-by" activity and should not go through the full friction of creating an account.

Recommendation (WIP)

HTML documents that represent an ActivityPub object SHOULD contain metadata in the form of <link> elements pointing to the canonical URL of the ActivityPub object.

<link rel="alternate" type="application/activity+json" href="https://social.example.com/post/1234">

The ActivityPub URL MAY be the same as the URL of the HTML view. Clients must specify an Accept header when requesting the ActivityPub view of the object.

Applications that support interacting with ActivityPub objects on other servers SHOULD offer the user a location to insert a URL and load an object they were previously unaware of.
[This is already implemented in Mastodon & Pleroma's MastoFE: they will check URLs put into the search bar for the <link rel> described earlier.]

This is primarily concerned with objects that have a conceptual "full-page" representation in the service's native UI. Use-cases such identifying the individual comments for a Video containing several replies.[]Note are secondary.

  • one solution: give every comment its own URL. YouTube does this slightly awkwardly with ?linked_comment_id.

This is pretty rambley but I don't want to lose my work so I'm going to submit it and see what people think.

This would allow a static site to serve ActivityPub objects as well as human-readable HTML. Currently it's not possible to do this, since static sites can't do content negotiation of course.

People who want to host static sites typically are not the same people who are going to custom-compile their nginx in order to install a content negotiation module.

The two very popular use cases that don't work when content negotiation is required are:

  1. hosting a site on GitHub Pages, Netlify, Amazon S3, etc

  2. using a caching CDN like CloudFlare

It would be really sad to completely exclude these very popular services from participating in the ActivityPub network.

Even Mastodon supports alternate URLs for their ActivityPub representations of pages, e.g. https://mastodon.social/@Gargron vs https://mastodon.social/@Gargron.json so it seems like it wouldn't be a huge stretch to have it advertise those URLs on the HTML pages.

How is the current mechanism of setting the Accept header in an HTTP request to be application/ld+json; profile="https://www.w3.org/ns/activitystreams" (for the AS object representation) or text/html (for its webpage representation) insufficient?

aaronpk mentioned scenarios where it isn't sufficient in the comment just before yours.

Thanks, I must have missed the refreshed page with responses.

  1. Given that ActivityPub is built on top of HTTP, and HTTP's content negotiation is optional, and HTML is distinct from ActivityPub, is it worth shoehorning an HTML detail into the ActivityPub specification to cover a specific HTML+HTTP use case? Or would this be better as a convention outside of ActivityPub?

  2. Do caching CDNs not allow one specify the Content-Type of the thing being cached at a given URI? My gut feel says its a limitation of the caching mechanism implementation, but I am not familiar with HTTP caching and so am not sure caching based on the request's Accept headers is permitted.

The fact that Mastodon and Pleroma already support this type of discovery makes it a good candidate for including in the spec. I think the language in the original proposal is a great start and we should continue iterating on it.

I'm a long proponent of having Accept always have some kind of <link> equivalent. People fought to have this out before because of the perceived complexity for discovery, but I was never convinced of that. You just gain so much by allowing more possible content sources. It's exactly for both static sites and hosted sites that, perhaps, force particular headers/content types, etc. Grimy real-world shenanigans basically.

The only recommendation I have is that the precedence order in which discovery happens is well described. HTTP Headers first, <link>s second, all other context last. Something like that. And, yeah, maybe a warning about caching when the URL is reused... a hint that cache-control or vary might be important, but that's a general implementation detail so it's also technically not necessary to add.

Just wanted to chime in here again to say that my implementation now also includes a rel=alternate link to the ActivityStreams JSON representation of the page.

e.g.

https://aaronparecki.com/2018/07/12/10/indieauth

<link rel="alternate" type="application/activity+json" 
      href="https://aaronparecki.com/2018/07/12/10/indieauth.as2" />

That brings this up to at least 3 implementations that support it, making it a good candidate to incorporate into the spec.

It's now 2023 and there have been at least two server implementations for a half decade since 2018, namely Mastodon and Misskey (while Pleroma's MastoFE is not maintained anymore). I guess it's time to specify this behavior somewhere, is there any reason for me to not go ahead with a PR?

The main branch has been inactive since 2019, maybe that's a reason?

Edit: actually at least 3 including #310 (comment), which seemingly is based on p3k.

Edit: I have a PR to a browser extension to make use of this for redirection to the specified Mastodon home server, so there's a use case too!

Slated for development, Lighthouse will use both header and HTML checking for rel=alternative of the document. This will permit things like static HTML documents or "less-controllable-by-users" Web pages (found on shared hosting) to also be accessible in ActivityPub interactions.

(Originally published at: https://jacky.wtf/2023/3/gPyQ)

I would really love to have this for the peer to peer static site publishing use case we're planning on having in https://distributed.press

Even though the content negotiation thing is "easy to do" if you have control over the server, in some cases like peer to peer protocols or static site gateways it's impossible.

Are there AP server implementors that would be happy to get PRs to add this functionality? I would love to contribute this to Mastodon and Calkey for example

@RangerMauve based on #310 (comment), Mastodon and probably Calckey already support this, since Calckey is a fork of Misskey.

Thanks for the discussion, all! Bridgy Fed has supported HTML => AS2 discovery via rel-alternate for a few months now too.

Oh wonderful! I'll make sure to include it in our testing then. Thank you for the heads up!

@snarfed @RangerMauve I tried to figure out what rel=me would mean, but it is hard, because it is not a registered link relation. Can you link me to the relevant specification and/or community that would be appropriate to manage a registration at https://www.iana.org/assignments/link-relations/link-relations.xhtml ?

I posted this on the wrong thread! I'm sorry.

@gobengo this proposal was about rel=alternate, which is registered at that link. Where are you seeing the discussion about rel=me in relation to this issue?

@aaronpk You're right. I think I messed up juggling a couple tabs. I'll correct my comment and put it in the appropriate place. Sorry! (I think the rel=alternate makes the most sense!)

@kevinmarks I understand it's controversial, but since this is a w3c repo, I use this https://www.w3.org/TR/2011/WD-html5-20110405/links.html#linkTypes

Just found that Threads started to include link tag: <link href="https://www.threads.net/&#064;zuck/post/C0zXcQmxO77" type="application/activity+json" />, but somehow without rel=alternate.

I think there are two choices here for us. One is to defer this to a FEP; we often do this for extension functionality, especially if there is new development to be done in the area.

The other is to develop a report from the CG. This more often happens when we have existing implementations, and we want to solemnize them with the CG's recognition, as with Webfinger and HTTP Signature. It seems like this particular profile is pretty close to widely implemented, so I think we could lean into this report structure if there's appetite in the CG.

There are a few discovery mechanisms mentioned in the thread:

  • Link: header in HTTP response
  • Content negotiation
  • LRDD

I think all of these have their advantages and disadvantages that could be discussed. I think the next step would be for the CG to decide which path to follow here.

since this is on the activitypub tracker, not activitystreams, I'd like to just note, that I think the most appropriate media type to use in a <link rel=alternate is the only media type that is required for ActivityPub clients to use:

The client MUST specify an Accept header with the application/ld+json; profile="https://www.w3.org/ns/activitystreams" media type in order to retrieve the activity.

https://www.w3.org/TR/activitypub/#retrieving-objects

<link
  rel="alternate"
  type='application/ld+json; profile="https://www.w3.org/ns/activitystreams"'
  href="https://social.example.com/post/1234">

The rationale is then, if a crawler follows these links, and attempts to dereference the link rel=alternate, it will also end up putting the media type in the Accept header that is required for an ActivityPub Client to include, and that way the crawler is mostly likely to be able to follow links to implementations who implement all requirements, but not necessarily anything with the optional media types.

IMO it's probably wise for HTML page publishers to include both links, one for each media type that they know they support, but only one of them is the 'activitypub media type' that ActivityPub Conformant Servers must support.

@aaronpk it's interesting to note 6 years later that https://aaronparecki.com/@aaronpk works just fine in 2024, and https://aaronparecki.com/@aaronpk.json... well, try yourself.

Or mine is https://tech.lgbt/@aytvill is just fine on Mastodon v4.3.0+...+glitch but https://tech.lgbt/@aytvill.json... just says request is not signed.

Which is to say - federated environment is neither consistent nor self-imposing own standards.

@aytvill Could you elaborate on why you are surprised with the outcome? Generally there are no specs that I am aware of saying that adding a .json string to URLs should generate a JSON version of a webpage. From what I can see adding the expected Accept header resolves to the JSON version of the site. Are there other implementations that support this .json feature?

await (await fetch(window.location.href, {headers:{Accept:"application/json"}})).json()

Mastodon Pull Request Issue Comment :- mastodon/mastodon#30398 (comment)

Was discussed at the SWICG meeting on 2024-08-02: https://www.w3.org/wiki/SocialCG/2024-08-02#%60%60%60Rel=author%60%60%60_topic_revived_from_last_Meeting

  • PROPOSAL: We prefer the use of rel=author for identifying authors of web resources.
  • PROPOSAL: that proposal + type=application/activity+json means that the href supports ActivityPub discovery and following (modulo open question about which specific discovery mechanisms)
  • PROPOSAL: Open a task force to focus on a discovery of ActivityPub actors and objects from HTML.

All 3 Proposals RESOLVED (agreed on by members present, no objections via -1's)

say what you will about Content Negotiation, exactly this is what it is for.

I've read through the meeting minutes and I think they're addressing a substantially different issue than the rest of this ticket(how to let pages outside activitypub "connect" themselves to ActivityPub users without necessarily themselves being represented in ActivityPub) and I don't really think we should lump it in with the use-cases this ticket describe (or the majority of the use-cases that are described on the task force repo)

I agree with resolution 1 & 2 but i'm -1 on resolution 3 bc I don't think that "discovery of ActivityPub actors and objects from HTML" is really what's going on with mastodon/mastodon#30398