Expressing Locator ranges for highlights
mickael-menu-mantano opened this issue · 46 comments
In the context of building the Highlight Navigator API, we need a way to express a selection range with Locators
.
We can't use a single Locator (as the model is now) because:
- not all the fragment types can express a range,
- a range can span two documents (eg. FXL spreads) so we need two hrefs
Using two Locators
is not great because the selection text (LocatorText
) is duplicated and it's not obvious which one should be used.
A potential solution would be to have an additional LocatorRange
object that contains two Locators
with empty .text
and an additional "standalone" LocatorText
that represents the content of the range.
LocatorRange {
start: Locator
end: Locator
text: LocatorText?
}
The .text
is optional because not necessary when passing the Highlight
to the navigator, and for protected book it would be empty (to avoid bypassing the copy rights).
Any thoughts, alternative solutions?
The electron/desktop navigator currently "augments" the Locator data structure, not by extending it (strictly-speaking), but via composition:
LocatorExtended
composes Locator
and SelectionInfo
(as well as a few other useful properties). SelectionInfo
is a JSON-friendly serialization format for a DOM Range, i.e. a "start" and an "end" pair of positions within a document (possibly at the character / TextNode level).
The Locator
refers to the "start" position of the selection, in document order. In the current implementation, the resolution of the Locator
is limited by the usage of CSS Selectors (i.e. DOM Element), even though the selection itself affords character-level precision. I will harmonize this in a future revision, when the APIs stabilize.
A word about the "visual mapping" technique: right now, in order to "turn to the correct page / scroll to the corresponding offset" when a Locator
link is followed (e.g. bookmark, selection, TOC item, etc.), the navigator relies on "client rectangles" coordinates (which is more reliable than the bounding box approximation, or scroll offset, due to CSS box model intricacies). The "beginning" of a targeted DOM element can be located on one page (i.e. column), but its end might be on the next page, which may be out of view (e.g. clipped paragraph, spills beyond the visible viewport). With character-level positioning extracted from the SelectionInfo
, it is possible to use the same "client rectangles" coordinate system, in order to resolve the most accurate visual reading position. For example: the selected text might be a short fragment at the end of a large paragraph, so with CSS Selectors the resolved position and the user's expectation would likely be out-of-sync (bad, confusing UX).
Follow-up: because Locator
and SelectionInfo
do not correspond to the exact same document location, strictly-speaking there is no redundancy in the text
property. But there is obviously some degree of redundancy in the expression of the actual "reading location", due to the different resolutions.
I've seen the SelectionInfo
and RangeInfo
models, but they are really tied to HTML. We need to come up with a solution that would work for all publication formats in the API.
Ideally, we would have a way to express the RangeInfo
directly inside the Locator
, maybe through a custom fragment (eg. [cssSelector]/index/offset
).
Here is the relevant part in the doc where I mentioned this: https://docs.google.com/document/d/1W_kbSRve7c1ZtyYTZ-fbEOsFZPTqxN0IV7UdvH_h20U/edit#bookmark=id.jc195922qtyf
If we decide to implement a per-format custom model like SelectionInfo
, we need to specify it so that it's implemented the same way on each platform, otherwise the annotations won't be cross-platform.
In my opinion, Locator
already has everything needed, if we figure out a fragment format that contains the data from SelectionInfo
. But maybe I'm missing something, Daniel?
I agree with you :)
I was just doing a recap / status report, in order to highlight the limitations of the current ad-hoc approach.
Regarding RangeInfo: although I am personally satisfied with the ad-hoc JSON-friendly serialisation format used in the TypeScript code of the R2 desktop SDK (navigator), I would like to hear more from EvidentPoint / Juan who have explored alternative DOMRange marshalling solutions in their GlueJS experiments. Perhaps there are certain pros/cons we should carefully examinate before committing to a particular solution.
Using two
Locators
is not great because the selection text (LocatorText
) is duplicated and it's not obvious which one should be used.
Some additional thoughts on that:
- text is not required in a Locator, which means that in use cases where we need two of them, we could simply avoid including text at all
- for large ranges, there are potentially issues that are more legal than technical (a large range could be seen as a way to extract text out of a publication)
The .text is optional because not necessary when passing the Highlight to the navigator, and for protected book it would be empty (to avoid bypassing the copy rights).
This raises a concern. We want Locators to be useful on their own, if they ever become orphans (publication is updated or no longer available). We need to find the right balance between having an empty text highlight and respecting the restrictions expressed by rights holders.
text is not required in a Locator, which means that in use cases where we need two of them, we could simply avoid including text at all
This sounds like the solution with the LocatorRange
object. Unless you mean that we don't pass the text selection at all, and just have a tuple (start: Locator, end: Locator)
? In which case, how does the host app get the selection text? We need it to present the highlights outside of the book, or in an annotation panel inside the book.
for large ranges, there are potentially issues that are more legal than technical (a large range could be seen as a way to extract text out of a publication)
Maybe a hard limit for the length of text.highlight
could be added. If we have the range and enough text context to be displayed in a list of highlights, then that's probably enough. Of course the highlight requires then the book to get its full content, and exporting the highlights in another form wouldn't work (eg. PDF export of the user notes).
And/or that could be part of the LCP app compliance tests that the user can't export or copy the text of a highlight extracted from an LCP protected book.
Fragments in the currently-proposed Locator model:
https://github.com/readium/architecture/blob/master/locators/README.md#fragments
W3C web annotations selectors:
https://www.w3.org/TR/annotation-model/#selectors
We know from implementation experience (currently in R2 SDK for desktop/Electron) that DOMRange can be reliably serialized into a text format / JSON-friendly data structure. This is a direct translation of DOMRange without the pitfalls of EPUB CFI (which can be seen as a useful interchange syntax, but not convenient / performant for the internals of R2 features). Basically, we cannot reference a DOMRange TextNode directly, so we encode the parent Element reference using a quasi-canonical / unique CSS Selector (as already used for Locators in R2 desktop/Electron), and we record the zero-based index of the child TextNode. Other than that, the data structure is exactly the same as the DOMRange object model (start, end, offset, etc.).
So, one option is to "flatten" this tuple notation into a single string so that it fits within the current Locator Fragment proposal (array of strings, reflecting different resolutions applicable to different media types).
Personally, I would very much prefer to preserve the "exploded" representation of the range's start and end locations. Instead of having to guess the type of Locator Fragment using some kind of parser (e.g. regular expressions), we would ; like W3C annotations selectors ; identify the fragment types using a predefined vocabulary ... but then we might as well encode JSON property names directly, for example:
myLocator.locations.fragments.cfi = "..."
myLocator.locations.fragments.cssSelector = "..."
myLocator.locations.fragments.domRangePos = { cssSelector: "...", textNodeIndex: "...", offset: "..." }
... instead of:
myLocator.locations.fragments = [
{ type: "cfi", value: "..." }
{ type: "cssSelector", value: "..." }
{ type: "domRangePos", value: { cssSelector: "...", textNodeIndex: "...", offset: "..." } }
]
Thoughts?
Then, we indeed have the problem of myLocator.text[before|after|highlight]
, more specifically myLocator.text.highlight
which applies to a range, i.e. a tuple of two Locators, one of start, the other for end (if we chose to represent a DOMRange using two such Locators).
https://github.com/readium/architecture/blob/master/locators/README.md#the-locator-text-object
It seems that we got a consensus yesterday on the highlights model, but before we close this issue, I feel I must challenge one of its hypothesis: "a range can span two documents (eg. FXL spreads) so we need two hrefs".
The reasons why I'm challenging it is that if it was not a requirement, we could extend the expressivity of some fragment expressions and get ranges in a simpler way. Already, cfis, media fragments (especially time based, but also rectangles in a sense), html ids to a certain extent can represent a text range or an audiovisual segment. It would then suffice to extend css selection expressions and we would get ranges in a single Locator.
It is true that Acrobat allows selecting text across PDF pages (but for Acrobat these may not be different "resources". But I tried iBooks and it does not allow selecting text across resources. I tried Aldiko iOS and could not select text across CSS columns.
So, are we sure we are not creating a complex model with no real requirement?
I think for EPUB it might really not be needed, since content will rarely be split between two resources (maybe for some FXL layed out like a PDF with split paragraphs). But more importantly, I don't think we can make the native selection work properly (or at all) between two webviews or even two iframes. The only way to make this work would have to either build the selection from scratch with overlays or have a convoluted UX to merge two highlights.
For PDF we don't need it either since a single PDF document is one href
.
Personally I'd be more comfortable having a single Locator
object representing the full range, including the text
. But if we go down this road we still need a way to express the range in one single Locator
. Having a custom range fragment format would work, but we then lose the expressiveness of the other discrete location formats. Also, the Locations
object contains additional properties that are by nature for a discrete location: position
, progression
. In which case what does it represent? The beginning of the range, the center?
In this context, I suggest again my initial proposal which is to have Locator.locations
be an array of Locations
objects instead of a single one.
- If the array is empty, it's a Locator to the global resource (like currently having a null
locations
) - With one object, then it's a Locator to one discrete location
- With two objects, then it's a range for highlights
- With more objects, it represents a path that could be used for example to represent a drawing on a page
@mmenu-mantano I like this last proposal but in such a case, what to do with fragments which can already represent a range, like "t=1,10"? Do we state that the must not be used and that we consider only fragments which represent a single "point" (in space of time)?
And how do we justify that we don't take the same approach as Web Annotations selectors, in which a single fragment can represent a range?
Note that even a css id is mapped to a "segment", not a single point in the text. Same for xpointer, xywh, even page for pdf.
Yes usually range fragments can also express a discrete location. I think it makes sense to ignore ranges in a single Locations
object since the other properties (or other fragment formats) can't express a range.
Also, this problem is there too if we use a containing structure with two Locators
like the aforementioned LocatorRange
.
Maybe it doesn't answer all formats but if a fragment is mapping to a rectangle (eg. a PDF page) then the top-left (depending on reading progression I guess) is the discrete location in this case.
I'm not familiar with the Web Annotations selectors, but if they offer a better solution I'm all for it, all the current solutions feel awkward in the Locator
model.
If we ignore the case of a range expressed over two different resources, then there's no need to extend the Locator
model at all:
- in an audiobook, a range can be expressed using
t=10,45
- in a visual publication, a range can be expressed using a rectangle with
xywh=160,120,320,240
- in an EPUB, a range can be expressed using a CFI, an XPath or any other fragment capable of expressing a range on a tree structure
Also, the Locations object contains additional properties that are by nature for a discrete location: position, progression. In which case what does it represent? The beginning of the range, the center?
IMO this should be the beginning of the range, since that's where you jump to when you want to present a range.
I'm not familiar with the Web Annotations selectors, but if they offer a better solution I'm all for it, all the current solutions feel awkward in the Locator model.
The Web Annotation model is very flexible, IMO a little too much for its own good.
As I said in my previous comment, it defaults to a single "selector" (equivalent of our locations) using either: fragment, XPath, CSS or text.
If the range can't be easily defined that way, it uses a specific "range" selector. If we ported that to our model, it would look like that:
{
"href": "http://example.com/track6",
"type": "audio/ogg",
"title": "Chapter 5",
"locations": {
"progression": 0.607379,
"totalProgression": 0.50678,
"range": {
"start": {
"fragments": ["t=389.84"]
},
"end": {
"fragments": ["t=529.52"]
}
}
}
}
I think this is not necessary in our case and I'd much rather rely on a single location to express range.
That's fine with me, so we just need a range fragment for EPUB that we can use in R2 (CFI is out by consensus).
It looks like Daniel's RangeInfo
model fits the need, we just need a way to encode it as a fragment.
Do you have any opinion on Daniel's comment Hadrien? #95 (comment)
If we want to keep the semantics without changing too much the JSON model, we could also use the PDF fragment approach with a query string:
locations.fragment = "selector=div[3]&index=3&offset=25"
EPUB CFI is indeed fit for purpose when it comes to expressing DOM position and range, as a standardized interchange format across systems.
However, in my opinion (and based on implementation experience) generating and parsing CFI references is error-prone on the border-cases, and it is also woefully inneficient.
Right now in the R2 desktop/Electron implementation an ad-hoc serialisation format is used to reliably and cheaply marshal DOMRange objects. Basically, a CSS Selector is combined with a child index, and a character offset (this 3-tuple is used for both the start and end positions). This is as close as it can possibly get to a native DOMRange, resulting in extremely efficient processing.
It would be a shame to flatten this simple JSON-compatible data structure into a string "fragment" inside the Locator payload. I would much prefer transporting this information across the reading system layers (typically: app <--> navigator) in its raw form, rather than unnecessarily massaging it just to fit into a prescribed Locator format (which would require awkward escaping rules in order to cater for the fragment's own syntax, delimiters etc.).
That is not to say that there is no value in the flattened "fragments", which are indeed naturally amenable to URI encoding (e.g. like the elegant Media Fragment t=0,99 for time ranges), and therefore desirable for interchange purposes.
We can certainly produce interoperable position/range references a-la W3C selectors, whenever necessary (for example when such Locator payloads are exported into non-R2 systems, like an external annotation service). But for the subsystems involved in R2 reading systems (navigator, etc.) I believe that we should stay as close as possible to native web browser engine constructs like DOMRange and CSS Selector, which yield the best performance and readability (have you tried debugging CFI?). As for XPath, I guess you mean XPointer scheme which would offer similar capabilities to CFI, but also similar headaches.
Thoughts?
Do you have any opinion on Daniel's comment Hadrien? #95 (comment)
I think it's slightly better than the Web Annotation approach, but I'm not a big fan of having a controlled vocabulary for fragments. IMO fragments should be registered alongside a media type (which is usually the case) and we should be able to express potentially any fragment, not just a list of well-known fragments.
Keep in mind that I'm mostly discussing the core model and its JSON serialization, what Daniel suggested IMO makes perfect sense in the context of a specific implementation for example.
@danielweck Are you mentioning this for your internal Locator
model, or for the JSON serialization that will be transferred between platforms and stored in the host app database?
I think we should be designing the format of the fragment that will be stored in the Locator JSON here (https://github.com/readium/architecture/tree/master/locators). Each platform can then implement helpers around the Locator
model to have the locator expressed in a more convenient way (eg. as a DOMRange).
JSON serialization that will be transferred between platforms and stored in the host app database
I do not understand the nuance of this statement. But let's take platform "desktop / Electron" as an example (bearing in mind that the same reasoning applies to any coherent reading system platform, like an iOS or Android app, or even a web app involving its own client/server code). The key subsystems that exchange Locator data back-and-forth
(as per the specification we are discussing) are the navigator and the host app. For example: the app receives information about the reading locations (then stores bookmarks either as native Locator + additional info, or as totally bespoke database format), and the app can subsequently instruct the navigator component to restore such reading locations by passing-in the standardized Locator payload. Same with selections/highlights, search module, etc. That is la "raison d'être" of such normalization effort.
The Locator object (herein discussed) transports this information across the navigator / app boundary in a standardized fashion, not only syntax-wise (typically in a serialised JSON form, due to technical constraints), but also in terms of the processing model (e.g. fragments that express different resolutions, precedence order). This works great for things like progression, position, CSS Selectors, time Media Fragments, which are quite consensual.
But here we are now discussing the more complex use-case of precise character-level document range, for which CFI was a semi-successful attempt to offer a reliable consistent model, and XPointer scheme never took off either.
In a platform-specific implementation context (e.g. the iOS, Android, Electron/desktop apps), why would a navigator ever bother creating and consuming CFI or other supplemental fragment types, when they have no concrete use for them? This is why in desktop/Electron we currently "extend" the Locator definition by means of composition (additional proprietary / ad-hoc data structure) in order to fill the gap where standard Locator is not satisfactory.
Now, for "interchange" purposes across different system implementations ; e.g. a streamer server and multiple platform-specific clients ; the Locator payload obviously need to be standardized, just like any other part of the architecture. What I am suggesting is that in the real world, implementations will not bear the burden (performance impact, debuggability, readibility, etc.) of creating processors and executing their code (for example, producing CFI fragments when the receiving end is known to ignore them completely) unless there is a need for generating and consuming the proposed multi-resolution/granularity, "flattened" Locator fragment syntax ... thereby favouring instead the extended / proprietary data structure.
Surely, we want to mitigate this? I can see the cons of incorporating a totally bespoke DOMRange structured serialisation format in the Locator model (which would be at odds with the dominant "linearized string" fragment approach), but I also want to make sure we examine the pros too. For me, the benefits in terms of performance, testability, readability etc. are not insignificant. I would hate to continue to use a totally different system alongside standard Locators (in the Electron/desktop implementation), when there is in fact an opportunity for us all to leverage the same structured syntax for selection/annotation ranges.
Do you understand my viewpoint? (sorry for the long-winded explanation)
@danielweck I think that we're having two separate discussions here:
- changing the fragment into an object (instead of a string currently)
- adding support for DOMRanges
IMO changing how fragments are represented has a lot of downsides and so far, only DOMRanges would benefit from this.
DOMRanges can already be added to a Locator
right now using a string and flattening the representation (selector=div[3]&index=3&offset=25
) or simply by using three strings (["selector=div[3]", "index=3", "offset=25"]
).
As an alternative, we can also define an extensibility model for locations
which would allow something like:
{
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"selector": "div[3]",
"index": 3,
"offset": 25
}
}
}
[...] why would a navigator ever bother creating and consuming CFI or other supplemental fragment types, when they have no concrete use for them?
I'm not buying the argument that CFI is completely going away.
It's used in EPUB.js and Vivliostyle Viewer, which both support RWPM. It's also the core fragment being used in Colibrio and remains an important focus for Evident Point.
I don't think it's going to be that uncommon for organizations to run a mix of Web Apps that are not primarily developed by Readium (yet compatible with RWPM) with mobile/desktop apps that are built by Readium.
Keeping the fragments as strings is really important in my opinion:
- A fragment identifier is supposed to be a string of characters
- We can use them with URI using #
- Having a JSON array with multiple possible types makes it much more complicated to parse on a more rigid language such as Swift
So adding a (standardized across R2 platforms) Locator extension for the DOM range could really be a solution. Especially since there's no standard fragment format to represent a DOM range and so this is likely a private Readium implementation.
But keeping everything in fragments
would make it easier in the JS layer to fallback on various fragment formats when available, to find the target location. We might not be interested in using CFI as our default implementation, but someone might want to add support to read annotations exported from other SDKs.
Note that on mobile we will most likely not forward the full JSON locator to the JS layer, since it will only be interested in the fragments and/or the DOM range.
I would hate to continue to use a totally different system alongside standard Locators (in the Electron/desktop implementation), when there is in fact an opportunity for us all to leverage the same structured syntax for selection/annotation ranges.
Definitely, especially to add additional data that should be encoded into the Locator
. However, wrapping a JSON locator into an object that adds helper to read the various data, and maybe build an actual DOMRange
in a provided document, might be helpful thing to answer the per-platform requirements. This is akin to the native Locator
model that we have on Swift/Kotlin. The line might be blurrier on a JS platform I guess.
We really want to close this issue tomorrow, do you have any additional comment @danielweck ?
I do actually :)
In your example:
{
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"selector": "div[3]",
"index": 3,
"offset": 25
}
}
}
...I think we need to distinguish start
and end
:
{
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"start": {
"selector": "div[3]",
"index": 3,
"offset": 25
},
"end": {
"selector": "div[4]",
"index": 2,
"offset": 11
}
}
}
}
Also, as the proposed locations.domrange
JSON property is a "first class citizen" on par with locations.totalProgression
/ locations.progression
/ locations.position
, does this mean that DOMRange information does not need to be serialized (and flattened) into the locations.fragments
array of string values? (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments )
I filed a separate issue to express a more general concern about the concept of Locator "fragments": #98
Also, as the proposed locations.domrange JSON property is a "first class citizen" on par with locations.totalProgression / locations.progression / locations.position, does this mean that DOMRange information does not need to be serialized (and flattened) into the locations.fragments array of string values? (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments )
Actually, this is not my personal preference nor the only solution that I've listed in my previous comment.
Overall, I think that the current approach of an array of strings is perfectly fine and works for what you've requested.
That said, we could use an extensibility model for both the top level of a Locator and for locations
. The example with domrange
was only meant to illustrate how this could work if we had such extensibility available.
Overall, I'm not in favor of extending our current model with anything too specialized, I think the mix of position
, progression
, overallProgression
and fragments
is good enough:
- it's as media-generic as we can be
- it provides both context and the ability to express very specific locations
- there's built-in extensibility through fragments (that can be defined per media type)
We had something more specialized elements before (with dedicated elements for CFI and CSS Selectors among other things) and I think it would be a step back to go back in that direction.
Progress update based on our latest conference call:
#99
Here let's discuss the specifics of domRange
.
Full example:
{
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"progression": 0.8,
"fragments": [
"#paragraph-id",
"t=15"
],
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"partialCfi": "/4/2/8/6[paragraph-id]",
"domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 22
}
}
}
}
Partial example (so we can focus on property naming, and meaning):
"domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 22
}
}
locator.locations.domRange.start
and locator.locations.domRange.end
have the same object structure, which is made of 3 mandatory fields: cssSelector
, textNodeIndex
, textOffset
.
locator.locations.domRange.start
is mandatory and can be used on its own (i.e. collapsed range with same begin/end pointer). locator.locations.domRange.end
is optional.
cssSelector
(string): always references a DOM element, but if the original DOMRange start/end references a text node instead, thetextNodeIndex
field (described below) is used to complement the CSS Selector.textNodeIndex
(-1
, or integer[0, n]
): when-1
, the abovecssSelector
corresponds to the DOM element of the original DOMRange start/end. Otherwise,[0, n]
represents the zero-based index of the text node child (of the DOM element referenced by the abovecssSelector
).offset
(integer[0, n]
): if the original DOMRange start/end references a DOM element, then this zero-based index represents the position of a child text node of that element. If however the original DOMRange start/end references a text node, this represents the position of a character within that text node.
Note that in [0, n]
, n
is indeed included even though it represents the total number of children. That is because index overflow is allowed, and it signifies a pointer after the last item, but before the next sibling element starts (just like XPointer and CFI edge-cases ... literally "edge" cases).
Also note that sometimes, mobile iOS/Android selection UX always normalize the original DOMRange start/end so that it never references DOM elements (always text nodes). This normalization step can also be explicitly added in desktop implementations (I am working on that, actually, based on the work of Apache annotator).
See TypeScript prototype implementation which inspired this proposal:
https://github.com/readium/r2-navigator-js/blob/7a2ddc8287659d6d985eb930b776e108891a3510/src/electron/common/selection.ts#L14-L33
Thank you, Daniel. The naming is very clear.
I have two questions/suggestions:
-
Would it make sense to use this
{cssSelector, textNodeIndex, offset}
object too instead of the singlelocations.cssSelector
? I guess it could be useful to point to a specific position in a long paragraph spanning more than one page. -
Rather than -1 for
textNodeIndex
, I would rather useundefined
(not setting the key in the object) to avoid ambiguity (-1 is a bit reminiscent of good ol' C but nowadays might be seen as "the first text node starting from the end"). Unless this value is directly used when creating aDOMRange
?
Also note that sometimes, mobile iOS/Android selection UX always normalize the original DOMRange start/end so that it never references DOM elements (always text nodes). This normalization step can also be explicitly added in desktop implementations (I am working on that, actually, based on the work of Apache annotator).
We really need to make sure that those locators are compatible when shared between R2 platforms. I think it's not such a problem if the output is different (not canonical) as long as when used from a different platform we end up at the exact same place in the DOM.
- I have a personal preference for keeping
locations.cssSelector
separate (simple string instead of object), which saves the consumer checking the presence of extra fields just to determine whether the locator references a "standard" CSS Selector (de-facto element) vs. our ad-hoc encoding method for DOMRange start/end = [element reference (cssSelector
) + non--1
textNodeIndex
+ characteroffset
] or [element reference +-1
textNodeIndex
+ text node index (offset
)]. - I agree that the argument that
-1
may actually be perceived as an index based on the end of the sequence of items, or even as the edge position before the first character but after the preceding sibling element. In my draft proposal, all 3 fields are mandatory, in the sense that they cannot benull
(actual value) orundefined
(no value). Theundefined
possibility fortextNodeIndex
would actually be useful to differentiate the special case oflocations.cssSelector
in your proposal (1). But if we keep the standalonelocations.cssSelector
, perhaps instead of-1
(orundefined
) we can usenull
fortextNodeIndex
?
Regarding highlights, we agreed on using a single "DOM-ranged" Locator
using the new notation (#99). This lifts the ambiguity of using LocatorText
in the range. 👍
I suggest that we use it this way:
before
andafter
are not usedhighlight
contains the cleaned (whitespaces coallesced and trimmed) text selection
I don't see any use for keeping the raw text once the Locator
is created, but we can revisit this if needed.
Additional processing could be done for highlight
depending on whether the book is copy-protected:
- Not filling the
highlight
at all - Consuming the highlight content as a selection copy in the rights (and if exceeded, then fallback on empty
highlight
) - Truncating the
highlight
up to a certain length. This doesn't really protect the book against copy but prevent copying whole sections of the book.
For non-protected books, I think truncating the highlight
could be detrimental. The host app can choose to truncate the content before storing in the database as seen fit.
-
We have to determine if the use case is useful (pointing to a character in a given text node). I think so, otherwise how would we be able to go to the middle of a long paragraph using only a CSS selector (ignoring
percentage
)? We could use a collapseddomRange.start
but it also adds additional checking and is easy to miss out in my opinion. (This additional property could be nameddomLocation
ordomPosition
) -
null
would work if the fields are mandatory.
Thinking about an alternative to avoid using offset
as a TextNode index (which seems counter-intuitive, given there is a dedicated field named textNodeIndex
), instead using offset
ONLY as a character index (which would then be optional). This updated proposal diverges slightly from the meaning of "offset" in a vanilla DOMRange object, but I think it is worth considering to make our model easier to understand.
cssSelector
(string): always references a DOM Element. If the original DOMRange start/end references a Text Node, thetextNodeIndex
field (described below) is used to complement the CSS Selector, andoffset
is used to tell a character position within that Text Node. If the original DOMRange start/end references a DOM Element, thentextNodeIndex
is used to designate the child Text Node, and thecharOffset
is not used (no explicit character position within the Text Node).textNodeIndex
(integer[0, n]
):[0, n]
represents the zero-based index of the text node child (of the DOM element referenced by the abovecssSelector
).charOffset
(null
, or integer[0, n]
):null
if the original DOMRange start/end references a DOM Element (in which casetextNodeIndex
points to a Text Node without any specific character offset within it), otherwise this zero-based index represents a character position within the Text Node designated bytextNodeIndex
.
Make more logical sense, I think, and can still roundtrip from/to DOMRange without ambiguity.
pointing to a character in a given text node
Yes, this would typically be a collapsed range. locator.locations.cssSelector
would continue to represent the "nearest" ancestor Element, useful for API consumers / processors that only support this coarse resolution. More advanced API consumers / processors can check locator.locations.domRange.start
to see if a more accurate reference is available.
I don't see any use for keeping the raw text
I have a preference for preserving "raw" text alongside "clean" text. This can be used to match actual text in the DOM (including line breaks, insignificant whitespaces, etc.), for example when reconstructing/correlating DOM text with spoken TTS utterances.
before
andafter
are not used
Why? Could that not be useful for both the "collapse range" case (discrete position) and the "spanning start/end range" case?
I'm talking specifically about the highlights for this case (eg. a bookmark would need to fill before
/after
). A collapsed selection should not be allowed in the SelectableNavigator
API, it would be a nil selection.
That being said, before/after could still be useful for highlights to add more context, especially if the highlighted part is only a few words.
I have a preference for preserving "raw" text alongside "clean" text. This can be used to match actual text in the DOM (including line breaks, insignificant whitespaces, etc.), for example when reconstructing/correlating DOM text with spoken TTS utterances.
There's nothing in the model right now that can hold raw text, but for specific use case we could expose the raw text directly in the navigator API related to TTS. Also, different API can produce different Locator
. So the locators created by the TTS could have the raw text in highlight
instead of clean. Since they are not made for display.
However, I think consistency might be more important. Maybe highlight
should always contain the raw text, and the host app would be responsible to process it for display as desired. We are not using the clean text in the R2 deps anyway.
Thinking about an alternative to avoid using offset as a TextNode index (which seems counter-intuitive, given there is a dedicated field named textNodeIndex), instead using offset ONLY as a character index (which would then be optional).
Agreed, I think your new proposal is much clearer.
One reservation: is there a practical difference between null
and 0
for charOffset
?
is there a practical difference between null and 0 for charOffset?
For all intents and purposes, no. However I remember seeing DOMRange pointing to TextNode without character offset ... I cannot remember how I came across this edge case, but I allowed 'null' / -1
to ensure precise round-tripping. But to be honest I think this should be normalized, and I am pretty sure that normalization utilities from Apache annotator etc. do this too.
Maybe highlight should always contain the raw text, and the host app would be responsible to process it for display as desired.
I think this makes sense. The "clean/normalize text" utility function should also be standardized (in the context of the R2 Locator model).
A collapsed selection should not be allowed
What about a collapsed DOMRange though? I would like to be able to use locator.locations.domRange.start
with undefined
locator.locations.domRange.end
(to reference a mouse click / text cursor inside HTML documents, for example).
I think this makes sense. The "clean/normalize text" utility function should also be standardized (in the context of the R2 Locator model).
Yes, especially if we consider that the Locator is really used to locate a precise position and not as a substitute for a Highlight
model, therefore the text excerpts should be used for that and not be stripped of meaningful whitespaces.
On the Swift native model we can easily add a dynamic property to get the cleanHighlight
, but it might not be practical on platforms sending only the JSON serialization back to the host app.
What about a collapsed DOMRange though? I would like to be able to use
locator.locations.domRange.start
withundefined
locator.locations.domRange.end
(to reference a mouse click / text cursor inside HTML documents, for example).
I was talking about the text selection for highlights specifically. If a DOM range is collapsed, then it is actually not a selection so this API would return null
: https://github.com/readium/r2-navigator-swift/pull/57/files#diff-71cffd3595869fdd8bacb309df9b5e6cR21
But for other single-location cases (eg. mouse click) then a collapsed DOMRange makes sense, and having text.before/after
might be useful.
On a side note, I'm not sure we need a Locator
for taps. It might be too costly to build it on-the-fly for each tap when the locator will most likely be discarded right away. Right now in the Swift test app we only use a point coordinate on the screen, to determine if we want to turn the page or show the toolbars. Maybe we could provide a closure to lazily build the Locator
if needed.
To summarize the current consensus on this discussion:
"locations": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"partialCfi": "/4/2/8/6[paragraph-id]",
"domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"charOffset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"charOffset": 22
}
}
}
DOM Range
domRange.start
is mandatory and can be used on its own (i.e. collapsed range with same begin/end pointer).domRange.end
is optional.
cssSelector
(string): always references a DOM Element. If the original DOMRange start/end references a Text Node, thetextNodeIndex
field (described below) is used to complement the CSS Selector, andoffset
is used to tell a character position within that Text Node. If the original DOMRange start/end references a DOM Element, thentextNodeIndex
is used to designate the child Text Node, and thecharOffset
is not used (no explicit character position within the Text Node).textNodeIndex
(integer[0, n]
):[0, n]
represents the zero-based index of the text node child (of the DOM element referenced by the abovecssSelector
).charOffset
(integer[0, n]
): Zero-based index represents a character position within the Text Node designated bytextNodeIndex
.
(@danielweck I removed the null
from charOffset
following your comment on normalization)
LocatorText
Locator.text.highlight/before/after
need to be filled with the raw text content to be able to reconstruct/correlate DOM text.- A to-be-determined standardized API to clean raw text will be provided. (eg. dynamic properties
LocatorText.cleanHighlight/cleanBefore/cleanAfter
)
Specification moved to: https://github.com/readium/architecture/blob/master/models/locators/extensions/html.md