WICG/scroll-to-text-fragment

Integration with W3C Web Annotations

westurner opened this issue ยท 46 comments

It would be great to be able to comment on the linked resource text fragment. W3C Web Annotations [implementations] don't recognize the targetText parameter, so AFAIU comments are then added to the document#fragment and not the specified text fragment.

I see that W3C Web Annotation Data Model is linked to under 'Other'.

https://hypothes.is , for example, implements W3C Web Annotations for highlighting and commenting on lots of things; including text fragments. WA is a JSONLD spec: it's definitely possible to encode JSONLD in a URI fragment. urlencoded without newlines is easy, if maybe unnecessarily verbose.

WA already solves for referencing in-page images and things; but hasn't (yet?) defined a IRI fragment syntax.

Is there a simplified mapping of W3C Web Annotations to URI fragment parameters?

https://www.w3.org/TR/2017/REC-annotation-model-20170223/#bodies-and-targets

https://www.w3.org/TR/annotation-model/#bodies-and-targets

https://www.w3.org/TR/annotation-model/#embedded-textual-body

https://www.w3.org/TR/annotation-model/#css-selector (CSS Selector)

https://www.w3.org/TR/annotation-model/#xpath-selector (XPath Selector)

The examples that @westurner pasted is equivalent to targetText. A bit more reliable identifier would be:

http://csarven.ca/dokieli-rww#selector(type=TextQuoteSelector,prefix=ed%2C%20browser-based%20authoring%20and%20,exact=annotation,suffix=%20platform%20with%20built-in%20support%20)

(handled by https://dokie.li/ clientside application)

Which selectors and states to use for a representation can be at the discretion of the application or context it is used in. W3C Note on Web Annotation Selectors and States goes much further.

I have started to map the W3C annotations to HTML using text selextor it isn't too hard.

https://gist.github.com/jgmac1106/726a8399dce96e28e0ae9cfd1ad288c3

I did add metadata to the annotations to make them useful to widely used and existing parsers

Compatibility with other W3C solutions that use RDFa will make sense.

Glad to see other W3c based approaches to annotation that rely on HTML.

Eventually these should talk. I currently export my annotations into HTML to display them on my website.

Is there a way to handle multiple attributes in the fragment e.g. so that
SPAs can just drop the e.g. &selector=... attribute when doing client side
routing?

Maybe something like:

#!route/two&selector=(...)

#!route/two#selector=(...)

Apache Annotator has an existing implementation (see the demo of the W3C's selectors note fragment identifiers mentioned above.

It would be great to involve more of the annotation community in this work before it gets shipped in Chrome.

The Open Annotation Community Group would be a great place to start.

Additionally, surfacing future Web Platform related Chrome features at the Web Incubator Community Group well before they ship would help the wider Web implementer and standards communities participate in discussions and development of the features they need.

Thanks!
๐ŸŽฉ

dwhly commented

As soon as I saw the announcement, I came straightaway to file an issue here, and presto you all beat me to it.

I just want to recognize the work that @azaroth42 and @BigBlueHat did in the Selectors and States doc linked above, which while a "Note" and not a formal recommendation (yet), we had hoped would lead to initial browser support for native deep linking. It would be a shame to miss an opportunity to ship a key feature like this and not incorporate the key benefits they've outlined-- specifically the things that @csarven called out in the Hacker News bit, e.g. specific selection to short strings that may be repeated multiple times in the same page. Thanks also Sarven, for arguing a few of the key points that you did.

I'll call out another thing which should be obvious, but needs saying:

Software is easy to change. Implementations will evolve. What's difficult to change is a trillion URLs in the wild. Let's try to get this right (as possible) from the get go. I heartily endorse Benjamin's call for working this through in the CG. It doesn't have to be an onerous process.

Thank you so much @bokand for this work, and for sharing it openly! I hope you take this feedback in the most constructive way! :)

Short strings that occur multiple times are an artificial use case. The natural thing to do is quote a larger, more unique fragment to be more specific.
The poetry case is shown in my Fragmentions for poets post.

dwhly commented

Short strings that occur multiple times are an artificial use case.

Um, No!

We have data from now 4.7M (and rapidly growing) highlights and annotations-- public and private-- that say otherwise. :)

Certainly when you're wanting to share a meaningful bit of text from a poem with someone else, maybe you'll tend to do so with longer strings (i.e. longer than a word in length). But that misses all kinds of real use cases, from copyediting, to personal highlighting, to just establishing an anchor to scroll the visitor to where they should start reading. Also, keep in mind that certain kinds of documents (legal, govt) often repeat blocks of text.

We see these in the thousands everyday here at Hypothesis, and they are not corner cases, they are core.

The length of the selected text is beside the point. The core challenge is about distinguishing between the original selection and the one that is to be recalled/selected/scrolled/marked in the document eg. should it be the first occurrence, all of the occurrences, the last one, one at random, or the one in fact the user originally selected? Which BTW, can be multiple selections with different text - already handled by Firefox. One can arbitrarily come up with a prefix and suffix string to an exact selection eg. 32 characters before and after. But that alone doesn't necessarily address everything. And, if you are only working with exact (the selection that a user made), ie. "the targetText proposal doesn't necessarily result in a unique identifier that deterministically corresponds to user's original selection." ( https://news.ycombinator.com/item?id=19169582 )

What is more of an artificial use case is actually assuming that the user is selecting text and knowing that it will be unique in the document (at any point in time). The least assumption to make is that they are going to just make a selection, and think nothing further. Why should they be expected to? It is punts a technical challenge (which is already addressed by the W3C Note) to the user because... ?

Hence, the point of the UI/user-agent within a particular context being able to make those distinctions. The W3C Selectors/States can handle these cases through whatever combination of fragment selectors (for robustness) that makes sense for a given resource representation, at a particular HTTP state, and time etc. All possible. Today.

The natural thing to do is quote a larger, more unique fragment to be more specific.

But if the person selecting the text isn't wanting that larger fragment selected, then it's not an accurate expression of the users intent.

There's plenty to discuss here--even if we're just talking scrolling to text--so I'd invite all of you to join the Open Annotation Community Group.

@bokand would love to have you join the community there and share more about this work. Thanks!

dwhly commented

I'm just reading @bokand's note in the larger HackerNews thread which I missed till just now:

Feature author here.
I'd like to first clarify that this is still in the super-early stage of development; none of this is shipped or finalized yet. The feature hasn't even requested approval to ship at which point these kinds of issues would be brought up. We take web-compat very seriously. If this breaks even a small percentage of pages, it won't ship. Part of the shipping process is ensuring we have at least a draft spec (W3C,WHATWG) at least some support from other vendors.

Sorry, the explainer text came off more dismissive than I intended. I wanted to get something implemented so we could start experimenting and see how this would work in the wild. #targetText= is a a first attempt at syntax, any criticisms or data on how this might break would be appreciated.

From my (limited) understanding of how fragments are used like this today, the way this would break a page is if the page itself was using "targetText=..." and parsing the result. This is something we can and will measure and see if we have a naming collision. For pages that use a "#var=name" type fragment, we could append "&targetText=...".

I'm not tied to any particular syntax here so if I'm missing why this is a monumentally bad idea, please file a bug on the GitHub repo: https://github.com/bokand/ScrollToTextFragment/issues"

Super encouraging, thank you! :)

look at Genius and lyrics

๐Ÿ’ฏ

$15M from Andressen says that this is a very real and practical use case. Also decades of work in annotations around the world at the more theoretical level.

The core challenge is about distinguishing between the original selection and the one that is to be recalled/selected/scrolled/marked in the document eg. should it be the first occurrence, all of the occurrences, the last one, one at random, or the one in fact the user originally selected? Which BTW, can be multiple selections with different text - already handled by Firefox. One can arbitrarily come up with a prefix and suffix string to an exact selection eg. 32 characters before and after. But that alone doesn't necessarily address everything.

In terms of UI, some ideas:

  • indicate an additional level of highlighting to indicate how much more text needs to be selected for there to be one unique match
  • indicate that there are multiple matches for the selection
  • give the option to select a nearby CSS selector or XPath (and indicate in the page what each would highlight; similar to 'Inspect Element' and page element removal tools such as e.g. uBlock)
    • note that there is no spec'd way to search the DOM in reverse?

Just as admin comment for the record: many people referred to the (informal) selector-to-URL mapping scheme, but I have not seen a URL for it, so here it is:

https://www.w3.org/TR/2017/NOTE-selectors-states-20170223/#frags

As @dwhly noted, this is part of a NOTE, ie, a non-normative work at W3C. I would even call it speculative:-)

That being said: beyond the Apache work that @BigBlueHat referred to, there is also a simple, example converter:

https://w3c.github.io/web-annotation/selector-note/converter/

I have not touched that code for years, I am sure there are bugs and results of some bitrotting...

@iherman

https://www.w3.org/TR/2017/NOTE-selectors-states-20170223/#frags says:

Note that this representation is valid only if the IRI for the Source does not contain a fragment identifier of its own (an IRI may contain at most one fragment identification).

https://github.com/bokand/ScrollToTextFragment/issues/4#issuecomment-464036000

Is there a way to handle multiple attributes in the fragment e.g. so that
SPAs can just drop the e.g. &selector=... attribute when doing client side
routing?

Maybe something like:

#!route/two&selector=(...)

#!route/two#selector=(...)

The stock Hypothesis client uses these selectors: TextQuote, TextPosition, Range, and Fragment (closest ancestor ID).

Of these, TextQuote matters most. I know this because I've built a variety of Hypothesis-compatible apps, collectively responsible for hundreds of thousands of annotations, that only use the TextQuote selector, which alone can deliver nearly all the resilience to ambiguity and change that you get with a richer suite of selectors.

But TextQuote isn't just a string to match, it's also a prefix and suffix. The combination is what enables resilience to ambiguity and change.

So if you were to implement just one W3C-style selector, I'd recommend that one.

But TextQuote isn't just a string to match, it's also a prefix and suffix. The combination is what enables resilience to ambiguity and change.

๐Ÿ’ฏ The TextPosition version is really only useful for static text that you can't share for some reason. Very precise, but very brittle. And not possible with any standard fragment patterns.

I think we're approaching the same thing from different directions. A string with sufficient prefix and suffix to make it unique within the document is the right answer for a scroll target. Labelling part of it as a more specific target to highlight is a useful enhancement.

The Genius website is not a great example of handling annotation with sensitivity.

dwhly commented

The Genius website is not a great example of handling annotation with sensitivity.

Perhaps not, but it is a website full of tens of thousands of texts with strings of a slightly longer length that repeat two, three or even more times (aka choruses). Presumably texts that Chrome users might want to share a deep link to with the best chance of being reanchored accurately.

s/useful/necessary/

@westurner (per https://github.com/bokand/ScrollToTextFragment/issues/4#issuecomment-464485999): maybe. I would like to have a clean processing model in my mind on what client side processing would exactly do. Note that selectors can do refinements of other selections (see, e.g., Example for 3.9. in the note).

However... the URI fragments are just a reflection of the selector model. As far as I am concerned, what really counts is the selector model, the URI syntax in the note is 'just' an (ugly!) syntax thereof. I would be wary to define a processing of the fragment id whose processing would go beyond the selector model.

(B.t.w., I consider the ugliness of the fragment ID-s as a major issue in the approach. Maybe a traditional fragment ID is not the right tool; it is not by coincidence that the Working Group has not put that approach into Rec Track... Maybe a shorthand for some of the frequently used selectors may be a good idea if we find something feasible.)

Would using a syntax like NYT's Emphasis create shorter hashes that are (more importantly) less fragile to page changes: https://open.blogs.nytimes.com/2011/01/11/emphasis-update-and-source/

Also, there's a video about how genius do fuzzy annotation anchoring that describes how they use the Bitap algorithm, specifically the implementation in Google's diff-match-patch library. Are you using that library in any way for working out where your annotation anchors are?

Emphasis hashes are not less fragile; they depend on very specific characters not changing in sentences of a paragraph. Having more complete text to compare is always going to give more options to fuzzy match or highlight what may have changed. Emphasis would happily give you a link that a 'not' had been inserted in to change the meaning, for example.

dwhly commented

Also, there's a video about how genius do fuzzy annotation anchoring that describes how they use the Bitap algorithm, specifically the implementation in Google's diff-match-patch library. Are you using that library in any way for working out where your annotation anchors are?

I believe that Genius' (uncredited!) version of our Fuzzy Anchoring implementation also used @tilgovi's Text Quote Anchor module (which uses a fork of google's diff match patch). Interestingly, the 32 character search length of bitap happily overlapped with great work that @kurzum did to determine the selectivity and stability of various lengths of strings in english over a random selection of 100 articles with > 500 edits from the Wikipedia corpus.

The other antifragile thing about long text anchors is that they convey enough of the linked-to text to be coherent in themselves, and so allow even broader fallback strategies, such as searching other document archives. My original Stoppard quote has a google fallback example

I'm not entirely sure how much this overlaps with Web Annotations as a whole. "Fragment Selector" is already handled by just #some-fragment in the URL, and this really just feels like sugar for specifying a "target" selector in the URL. Given it's just that one bit and it doesn't contain certain useful things like selecting the nth occurrence, I'm not convinced we should be bound to it so much as just seeing it as a starting point.

Hi folks, sorry for the delayed response - I got a little side tracked.

Thanks for all the links and discussion. I've done some more reading and feel like I understand the prior art much better now.

The Open Annotation Community Group would be a great place to start.

Additionally, surfacing future Web Platform related Chrome features at the [Web Incubator Community Group]...

The goal was always to move this into WICG, the HackerNews post caught me a little off guard so this became public a little early. I'll try to get the conversation started in WICG this week, time permitting. Regarding the Open Annotation CG, I'll definitely reach out (I've also spent some time reading the relevant texts). I'm weary of broadening the scope of this proposal too much but, though the goals are somewhat different, there's definitely significant overlap.

From https://news.ycombinator.com/item?id=19169698 :

I urge you to dig a little deeper and see why things like prefix, exact, suffix exists in that particular example:
https://www.w3.org/TR/selectors-states/#TextQuoteSelector_def

From working with teams at Google I've come around to the same conclusion. Being able to match short non-unique strings (e.g. table headers, headings) is important so surrounding context seems necessary.

many people referred to the (informal) selector-to-URL mapping scheme, but I have not seen a URL for it, so here it is:

https://www.w3.org/TR/2017/NOTE-selectors-states-20170223/#frags

Thanks! That's very useful. I actually think

http://example.org/page1 #selector(type=TextQuoteSelector,exact=annotation,prefix=this%20is%20an%20, suffix=%20that%20has%20some)

Is a verbose but extensible and existing version of what I have in my mind at this point.

I would be wary to define a processing of the fragment id whose processing would go beyond the selector model.

Just spitballing here: As mentioned above, TextQuoteSelector seems like the most useful of the selectors. What if we added to the HTML fragment processing model (the fragid structure?) that if the fragment doesn't match an id, we interpret it as a TextQuoteSelector using the following shorthand:

http://example.org#prefixText,exactText,suffixText

In this case, we would highlight only the exactText, the prefix and suffix being used only for context to disambiguate matches. We could, of course, additionally allow specifying a full selector(type=.... fragment, making it extensible in the future.

I think there's two things that this still doesn't satisfy:

  • Being able to highlight multiple quotes. I think this is important for cases where the text is discontinuous. For example, highlighting multiple cells in a table, or two paragraphs separated by an image. Given @westurner's point:

https://www.w3.org/TR/2017/NOTE-selectors-states-20170223/#frags says:

Note that this representation is valid only if the IRI for the Source does not contain a fragment identifier of its own (an IRI may contain at most one fragment identification).

it seems like this might require some flexibility. I can think of convenient syntax (e.g. '&' to separate either the shorthand or the selector() terms) but that would contravene the above note.

  • Selecting long quotes. It'd be nice if we could add the start/end syntax to the TextQuoteSelector so that selecting long passages is possible without extremely long URIs. Personally, I feel strongly that a user-readable and usable format has many advantages.

For anyone following along, I've just posted this on the WICG discourse, feel free to reply there. Assuming there's interest, we can move this repo into the WICG org.

I think there's two things that this still doesn't satisfy:

#!route/two&selector=(type=,)
#!route/two#selector=(type=,)
#!route/two;;selector=(type=,)

But the current selector URI encoding does not include a key/value syntax (which may already be specified in a different W3C spec?):

#selector(type=,)

... This would be a third thing to include in a summary on discourse; along with a link to this issue: https://discourse.wicg.io/t/proposal-allow-scrolling-to-a-specified-text-snippet-in-a-navigation/3442

Done, I've summarized those on the thread.

I'm not super familiar with the WICG process yet but my impression was if there's interest in the discourse thread we'd move this repo to the WICG's GitHub and continue the discussions in GitHub issues, right?

btw, the SPA issue I think is orthogonal to the proposal. Correct me if I'm wrong but the same issue applies for id-based fragments. So I don't think we'd need to block on fixing that. Though we may want to fix it in parallel while we've got the specs opened up for edit.

@bokand

btw, the SPA issue I think is orthogonal to the proposal. Correct me if I'm wrong but the same issue applies for id-based fragments.

Agreed, but I disagree it's fully orthogonal. Some of the other issues that could arise from an ID-based formulation that isn't censored:

  • It's not quite as controllable as fragments in general:
    • The text might be user-generated, so it could in theory scroll to text the developers had no idea existed.
    • The text might be in a closed shadow root, so you would never be able to reach the element in the first place to scroll to, without native APIs not directly exposed to JS. (Accessibility tools have already had issues with this, being unable to read the text by just traversing the DOM.)
  • There could be both a dynamic fragment (maybe selecting which tab in a container to open) and a search string. This isn't unique to SPAs - simple jQuery tab containers are arguably a more common situation with even greater variation.

So I don't think we'd need to block on fixing that. Though we may want to fix it in parallel while we've got the specs opened up for edit.

Not saying the whole proposal should be blocked - a possible JS API for selecting and searching (which IMHO should exist) could evolve independently of this mod URL-related stuff. However, the URL structure itself I'm not convinced could without complications arising.

I think trying to specify additional fragment syntax is unnecessary. SPAs can already break the ID-based fragment selection. Providing a way to keep text selection outside the SPA seems at least like something that can be addressed later, if at all.

Not saying the whole proposal should be blocked - a possible JS API for selecting and searching (which IMHO should exist) could evolve independently of this mod URL-related stuff. However, the URL structure itself I'm not convinced could without complications arising.

I think exposing the text selection API would not only allow SPAs to use it as part of their routing, but I imagine it would also be useful in other contexts.

I am much more interested in exposing the API than creating a special fragment separator.

@tilgovi Agreed, and it might be better to direct people to, in the face of async loading:

  1. Save the search parameter and clear it synchronously, to block automatic execution of the search.
  2. Perform requisite initialization, resource loading, etc.
  3. Once all requisite data is loaded, execute the search.

(Admin comment)

@bokand

I'm not super familiar with the WICG process yet but my impression was if there's interest in the discourse thread we'd move this repo to the WICG's GitHub and continue the discussions in GitHub issues, right?

there are no rules cast in concrete. Referring to this repository/issues from the WCIG (which you did) entry and continue the discussion here is perfectly fine for now. If this line of discussion leads to some more specific W3C work (e.g., API, or moving the fragment id forward, etc) proposal then, at that point, we may have to move the repo to the W3C, but we are not there imho.

It's also possible to get a WICG repo setup for this spec as well. It might also be good to coordinate with the similarly intentioned https://github.com/bryanmcquade/scroll-to-css-selector project. The hope being that we can collectively arrive at a shared style/structure for the non-id-based identifier space--and find one that's extensible for that next batch of use cases. ๐Ÿ˜ƒ

Excited this is moving forward. Thanks for kicking it off @bokand!

Just to clarify, this project is the scroll-to-css-selector, or what it evolved into :). We found the security issues related to CSS selectors to be more difficult than we imagined so we've decided to try text selection instead. But I agree a successful outcome would be specifying a way to target other types of identifiers for the future (but focusing on the text use case for now).

We found the security issues related to CSS selectors to be more difficult than we imagined so we've decided to try text selection instead.

Could you (or someone) elaborate on this a bit more? Might be best done in a separate issue, I suppose. ๐Ÿ˜ƒ It would definitely help the annotation community (who's likely a frequent user of whatever comes out of this work) to understand did and did not work in past explorations.

Much thanks!
๐ŸŽฉ

Could you (or someone) elaborate on this a bit more?

The main point here was that causing scrolling is likely to be detectable across origins. If you can detect that the target scrolled (this, in addition to known timing attacks), you can detect that the selector matched and so you can exfiltrate arbitrary content from the page like XSRF tokens. We looked at restricting the CSS selector syntax to restrict the ability to reach to "far away"/arbitrary places in the DOM but that was becoming more and more complex while losing the main benefit of being able to use standard CSS selectors.

Text selectors have similar issues but text tends to be less catastrophic than something like an XSRF token. It's also easier to restrict, e.g. restricting searches to word boundaries to prevent repeated guessing of a visible password.

bokand commented

Closing out old issues - I don't think there's anything actionable in this issue but feel free to file a new issue