readium/architecture

Calculating the Publication.positionList

mickael-menu-mantano opened this issue · 18 comments

We need to specify for each format:

  • How to create the positionList?
  • How to find out reliably the currently displayed position from positionList?

Related issue: Total progression in a publication for locators

CBZ and PDF

Those formats are straightforward, we can read directly the number of pages for PDF and files for CBZ to build the positionList. To retrieve the current position, we just need the index of the page.

PDF can be a bit less efficient because we need to open the file (potentially load it entirely in memory, eg. with Swift) to read its number of pages.

LCPDF

LCPDF contains encrypted PDF. So we can't really get the positionList until the license is unlocked. It might also contain several PDF, which is not very efficient if we have to open all of them to calculate the positionList.

An alternative would be to have the number of pages as a link property for each resource in the RWPM, then it's really efficient to build the positionList and doesn't require the publication's passphrase.

The positionList is built by adding the number of pages of each PDF in the readingOrder. Here's an example implementation in Swift: https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L50

We can find out the current position easily by keeping a separate array of positions for each resource href, and using the page index of the currently visible resource (eg. https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L221).

EPUB

The tricky part that needs to be discussed...

How to create the positionList?

Among the solutions discussed to split a resource into pages:

  • characters: This might be more accurate, but we need to parse each resource to calculate it, and we need the passphrase if the book is encrypted with LCP.
  • bytes: This is quick and easy (read in the ZIP entries or encryption.xml for LCP) and reliable enough in my opinion. However, there's no way we can match bytes with the DOM in a web view (not necessarily a problem if we don't match accurately, see section below).
  • scroll size: This is the most accurate on a given device (take into account images and layout), but highly inefficient since we have to load all the resources in a web view in the background. Moreover, it doesn't work well across devices because the calculated positionList might be different. Not a good solution IMHO.

Both the characters and bytes methods are pretty reliable to express the relative size between reading order resources and publications, as long as the chapters are not image based.

How to find out reliably the currently displayed position from positionList?

I think we agreed on a call that there's no way to accurately find the current position in an EPUB. The DOM displayed in a web view is dynamic and might not be equivalent to the one parsed from the static XHTML files. We can however approximate it:

  • Using progression in the resource to calculate the index of the position: So far the progression has been a pretty reliable way to position a page in a web view, and it could work well across platforms here too. It's not such a problem if we don't match the exact position that was split arbitrarily (bytes or characters), as long as we are reliably imprecise across devices. We need to end up at the same page when sharing a position index between devices. On the plus side, this is easy to implement to make some test quickly.
  • Calculating the character offset: This might be a more reliable way to match exactly the position if it was parsed using characters. It might not be 100% reliable though since the DOM in the web view is not the same as in the static XHTML. Moreover, it is much more complicated and the added value is not clear to me compared to using progression.

Fixed layout vs reflowable

There's the added difficulty that an EPUB can contain both fixed layout and reflowable resources. Fixed layout is straightforward, one resource = one page. But we need to take it into account when calculating the positionList instead of only splitting by characters/bytes.

Side discussion

Calculating the positionList might be slow and memory/CPU-intensive (eg. for LCPDF we have to load all the PDFs in memory). I don't think that it's necessary to expose an asynchronous API for Publication.positionList. The caller can wrap it in a background process if it doesn't need the positionList synchronously.

However, we could benefit from having a cache in the streamer to store the calculated positionList (eg. as JSON).

  • If we create a cache, it must be extensible for other type of data that we might need in the future (eg. information parsed from each resource needed by the navigator).
  • The cache data should be associated with the publication's release identifier and not file path, to avoid duplicates and outdated positionList. For information, we don't expose the release identifier in Publication, but it can be retrieved privately directly in the streamer for EPUB.
  • I don't think this should be persisted by the testapp itself, because this data is actually required by the navigator. It would complexify usage while increasing the risk of wrong data, putting accurate positioning at risk.

After today's call, the group consensus is:

  • EPUB reflowable should be split by bytes, and the position retrieved using the resource progression.
  • No (on-disk) cache in the streamer, as we don't want to add persistence on behalf of the host app. Since we're using bytes for EPUB, only PDF might have performance issues.
  • A first prototype will be done on two platforms to test the interoperability of this solution.

I implemented the solution for Swift:
readium/r2-shared-swift#66
readium/r2-streamer-swift#118
readium/r2-navigator-swift#65

This adds two properties to Publication: positionList and positionListFactory. The factory is a closure provided by the parsers to be able to build the positionList lazily.

positionListFactory is overwritable by the host app if needed. For example, using a different closure, a host app can implement a cache system by pulling the cached positionList from a database.

Regarding the EPUB implementation, I used a size of 3500 bytes for splitting a resource into a number of positions. This amounts to about one page of text on an iPad. Of course, this value needs to be the same on every platform, so we might need to experiment a bit to find the sweet spot.

Note that the size of a "page" for this calculation should better be compatible with the notion of "printed page" in LCP. The LCP spec contains:

"For the print right, a page is defined as follows:
The page as defined in the Publication, if it is pre-paginated (fixed layout) OR
The page as defined by the page-list nav element of the EPUB Navigation Document, if this exists OR
1024 Unicode characters for all other cases"

... 1024 Unicode characters (not bytes).

Two solutions:

  • an evolution of the LCP spec (It's still possible for this very subject but with must hurry up).
  • consider that 1024 characters ~= 1024 bytes (but this is true only for ASCII characters coded in UTF-8) or a similar approximation.

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

... 1024 Unicode characters (not bytes).

Since we already agreed that the positionList would be an approximation – because we don't have an easy way to map the current unicode character, or byte for that matter in the webview – I think that keeping bytes is the best approach. It's how the RMSDK used to calculate the page numbers and it's much more efficient to create the positionList (right now, we don't have a cache). Using the resources' content would require opening all of them – with decryption if needed – upon Publication parsing.

Retrieving the current position in the web view is particularly difficult for dynamic books, because the rendered DOM might not be the same as the one in the Container that we use to construct the positionList.

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

I agree that it would be better to reflect the actual page-list, unless we recognize that positions are not in practice equivalent to pages.

There're some technical difficulties in retrieving the current position in the web view if we would use page-list though, because they can point to particular DOM elements, so we need to figure out which one it is we're in.

Is there any discussion or specification on how to calculate the positionList for audiobooks yet? Does it make any sense to have it at all?

cc @HadrienGardeur

Is there any discussion or specification on how to calculate the positionList for audiobooks yet? Does it make any sense to have it at all?

I don't think that it makes sense to calculate positions for audiobooks.

We can use the temporal media fragments and progression/totalProgression, that's enough.

I we agree on 1024 bytes as the distance between two positions, we'd better update the LCP spec quickly (both the Readium spec and the ISO draft).

Please add a thumb up if you agree with 1024 bytes.

I agree that it would be better to reflect the actual page-list, unless we recognize that positions are not in practice equivalent to pages.

It seems we all agree that positions are defined as a an approximation of the notion of page, when this notion is not expressed clearly in an ebook. Please comment if you disagree.

There're some technical difficulties in retrieving the current position in the web view if we would use page-list though, because they can point to particular DOM elements, so we need to figure out which one it is we're in.

I didn't think about this one. But do we need to compute a current position? we need the current locator, which can be expressed with sufficient precision as a progression (plus a specific DOM related measure in the case of Readium Desktop).

I we agree on 1024 bytes as the distance between two positions, we'd better update the LCP spec quickly (both the Readium spec and the ISO draft).

Please add a thumb up if you agree with 1024 bytes.

We've already agreed on this during a call but I've added a 👍anyway.

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

I disagree about that statement, I think that in every EPUB we absolutely need to compute a position list.

A page-list is not the equivalent of a position list:

  • it's based on strings and not integers, which makes it complicated to provide an affordance for jumping to them
  • pages in a page list can be spread very far apart from one another, which would not provide a usable reference that users can share between them
  • pages in a page list can completely skip resources in the reading order, which would make them impossible to reference between users

I didn't think about this one. But do we need to compute a current position? we need the current locator, which can be expressed with sufficient precision as a progression (plus a specific DOM related measure in the case of Readium Desktop).

IMO not providing a position (hence a positionList) consistently will break the API because host apps won't be able to rely on a consistent interface, for example to present a page scroller.

I don't think that it makes sense to calculate positions for audiobooks. We can use the temporal media fragments and progression/totalProgression, that's enough.

While I agree that it makes less sense for audiobooks, I think it's still worthwhile to generate a positionList. For the same reason I mentioned above, we want to provide a consistent API so that the host apps can rely on it with generic uses. For example, a page scroller could be used to scroll through an audiobook as well.

A page-list is not the equivalent of a position list:

it's based on strings and not integers, which makes it complicated to provide an affordance for jumping to them

that is right.

pages in a page list can be spread very far apart from one another, which would not provide a usable reference that users can share between them
pages in a page list can completely skip resources in the reading order, which would make them impossible to reference between users

Reading this part, I'm wondering why page lists would be explicitly added by publishers if they don't allow users to share and reference them in a proper way.

If there is a notion of position list on one side and a notion of page list on the other side, I don't see how we can design a good UX in reading apps. We are told that exposing page lists like we expose ToC is bad, and that they should be accessed via "go to" actions ... like position lists.

We are told that exposing page lists like we expose ToC is bad, and that they should be accessed via "go to" actions ... like position lists.

With strings, the only way we can expose a page-list is like a ToC. You can't build an affordance with a text field, that would be a usability nightmare.

You can't build an affordance with a text field, that would be a usability nightmare.

A best practice should be discussed among app developers. In Thorium, we have planned a "go to page x" affordance and need to map it to the proper locator. If there is a page-list in the publication, this seems the proper list to use. If there is none, the position-list seems to be the proper fallback. Having a "go to" plugged to positions and an additional ugly screen of page numbers is a bad solution.

In Thorium, we have planned a "go to page x" affordance and need to map it to the proper locator.

Which type of field do you plan on using for that affordance ?

The current "go to page" affordance in Thorium is a simple text field where users type-in an arbitrary single-line string of characters. This is sufficient to meet accessibility requirements for the classroom scenario: "teacher asks students to open page '45' (or 'IX' in Roman numerals) in their printed publications, or in the digital equivalent provided by the EPUB3 @epub:type=pagebreak mechanism". Obviously, this is a "naive" string match, based on a string of characters input, with minimal cleanup/normalization (i.e. left-right trimming of insignificant whitespace, to match the syntactical rules of XML/XHTML NavDoc nav@epub:type=page-list).

The notion of "position" discussed here (i.e. fragmentation of publication resources in the readingOrder by arbitrary units of 1024 bytes) is different. Apples and pears.

The notion of "position" discussed here (i.e. fragmentation of publication resources in the readingOrder by units of 1024 bytes) is different.

Which affordance should be associated with such data then, if any?

Which affordance should be associated with such data then, if any?

A similar one where the field doesn't accept a string but simply an integer. You can also add +/- buttons or a SeekBar.