bokand/web-annotations

A suggestion - Sub Resource Integrity hashes for Web annotations.

Opened this issue · 2 comments

Not necessarily something that needs to be in the first version of a web annotation feature, but I think it would be good to optionally include something equivalent to, two Sub Resource Integrity hashes. One for the section of content you are linking to (maybe including a CSS Selector to specify the specific content or just using the text fragment feature) so the annotation link can verify the content the annotation being loaded for is the same content the annotation was made for.
As an example, this could help with situations where a positive annotation on a product page could be miss used by the site owner switching the product for a different product.
If the hash verification fails when the site loads a report could be returned to the annotation site if the annotation was not completely embedded in the annotation link. That way the site serving the annotation data can also learn when Annotation links are no longer valid.
The second hash would be to verify the annotation being loaded is the annotation that was expected. This way it the site owner is embedding a specific annotation the site loading the annotation could also verify that the annotation it is loading for its users is the same annotation the developer has seen when testing, and prevent any possibility of the annotation site switching annotations with advertisements, based of the requester.

Thanks for the feedback!

As an example, this could help with situations where a positive annotation on a product page could be miss used by the site owner switching the product for a different product.

Thanks, this is an interesting case worth considering. Since you mention using a selector/text-fragment, I guess you're thinking of an annotation that's targeting a page, rather than specific content on it? I wonder if the annotation community has considered this kind of issue?

I guess it's a bit tricky because if the annotation isn't meant to be tied to any particular part of the page, it's hard to say what part of the page should be considered important. Pages change so we'd have to be resilient to content changing. In an adversarial situation, if the page content is used only to verify integrity, rather than visually attaching an annotation to it, the author could include the specific content in a hidden area to pass the check.

The second hash would be to verify the annotation being loaded is the annotation that was expected.

I'm not sure I understand the use case here, could you elaborate on it? Generally, a site author wouldn't put "expected" content into an annotation. Isn't the whole point that annotations can be added/created beyond what the author created?

I guess if a page gets the ability to specify a source for annotations to load from it might only want to show approved annotations...but at that point it seems simpler to just host the annotations inline in HTML rather than at a third-party host.

Hi
I though I would give an example for a real world use I am thinking about where, providing a hash to verify the content you want to annotate could help.

I work on maintaining an online learning systems and in the courses teachers often link to web sites, and accross all our courses we have hundreds of thousands of links to external web sites.
We built a link checker to find broken links, but what is more difficult to find is sites where the content has changed, or the domain has been taken over since the teacher first linked to the content, and in some courses the student is accessing the content with a different login to the link checker to access journal articals or items that we are not allowed to crawl with the link checker.

Idealy we would like teachers to be able to link to the specific content, verify the content is correct, and then annotate with a note for the student.

AS an example a teacher could currently link to a specific paragraph on a wikipedia page at the moment with the text-fragment method.
https://en.wikipedia.org/wiki/Annotation#:~:text=From%20a%20cognitive%20perspective,speech%20without%20annotation

and with this annotation proposal you could allow the teacher to create an equivalent link to annotate that paragraph.
https://en.wikipedia.org/wiki/Annotation#:~:note(selector=mw-content-text>div.mw-parser-output>p:nth-child(14),text=teachers-comment-here)

But I think it would be useful to be able to include a hash of the specific content that has been annotated, not the whole page, just the selected text or element. So for the wikipedia link example the hash would be just of the text for that paragraph. the other text of the page could change and the annotation would still be valid, but if that paragraph changed then if would break or alert the user or author in some way.
https://en.wikipedia.org/wiki/Annotation#:~:note(selector=mw-content-text>div.mw-parser-output>p:nth-child(14),text=teachers-comment-here,hash=sha256-3700fa499086ae172461e9905e73716be862b47582eec58d12c9802fb0d417a1)

On public pages this might not add alot of value, but I believe there are some sites where the servers will return different content depending on where the user has logged in from or if they are signed in, and having the ability to verify the annotation is displaying on the content it was intended for could be very valuable.