w3c/epub-specs

Remove restriction on page numbers as text

Closed this issue · 5 comments

The techniques document currently includes this text under the page break markers section:

Do not include the page number as text content, as this practice forces the user to hear it announced wherever it occurs (e.g., without any context in the middle of a sentence).

While the concern is probably still valid, this seems too strongly worded. I think we can warn about the issue of page numbers being confusing when within the text without a strong ban, which I'm not sure is even followed.

I agree, partly because on the one hand, assistive technologies are increasingly supporting DPUB ARIA roles and on the other hand, inserting page references with the prefix "page" localized to the language of the text in which they are inserted is challenging...

@mattgarrish are you talking about this document? I suppose “page number as text” refers to descriptive text in addition to the page number itself (like @gregoriopellegrino mentions wrt. DPUB ARIA 1.1)?

In this context, though, I’m curious about the handling of e.g. roman numbers as page numbers (as is common for the front matter of printed books):

<span id="page-v" epub:type="pagebreak" role="doc-pagebreak" aria-label="v"/>

Does AT recognize roman numerals correctly or are non-arabic numerals discouraged in general?

I suppose “page number as text” refers to descriptive text in addition to the page number itself

No, the restriction was on adding the page number as text rather than in an attribute, so it was banning this type of markup:

<span role="doc-pagebreak" epub:type="pagebreak" id="pg123">123</span>

By the techniques document, you always had to have an empty element like this:

<span role="doc-pagebreak" epub:type="pagebreak" id="pg123" aria-label="123" />

The reason for using hidden markers in the past was reasonable enough, because AT didn't recognize the DPUB-ARIA roles so someone listening to TTS playback would just get random numbers announced in the middle of sentences. They'd have to know when a number was part of the text or actually a print page reference. @gregoriopellegrino is noting that AT support is improving, so it should be possible now to differentiate the page breaks regardless of whether they are rendered text or not.

Does AT recognize roman numerals correctly or are non-arabic numerals discouraged in general?

Roman numerals have always been a bit problematic no matter where they are used, so if you can avoid them it's a good idea. That said, they're such an engrained part of publishing for front matter that we can't realistically tell publishers to stop using them.

AT often rely on heuristics to determine how to announce them, so ideally what we need is ssml support to tell them how to pronounce the numbers correctly. That said, if a user gets a garbled page number they should be able to have the text spelled out and be able to decipher the number that way.

That makes sense, thank you for elaborating @mattgarrish!

Regarding roman numerals, I guess one could use the actual Unicode code points U+2160 through U+2188 instead of the latinized letters:

<span id="page-v" epub:type="pagebreak" role="doc-pagebreak" aria-label=""/>  <!-- ARIA label uses U+2174 -->

though, as @gregoriopellegrino mentions, it would require ATs to support that…

I don't believe those are any better supported, certainly not for making sense of them as compound numbers.

And if they were to be read out by their unicode names, I'd think that could be even more difficult to decipher since you'd have to listen through a lot of "roman numeral" announcements and have to know that "roman numeral one roman numeral 10" is actually 9, for example, rather than having "i" and "x" spelled out..

It's a tough problem.