w3c/epub-specs

Invalid use of infra definition of whitespace

mattgarrish opened this issue · 3 comments

As noted in #2636, we mix the xml and infra definitions of whitespace and whitespace handling for the viewport meta tag value in this paragraph:

The authoring requirements in this section apply after whitespace normalization [xml] (i.e., after a reading system strips leading and trailing whitespace and compacts all instances of multiple whitespace within the attribute to single spaces). EPUB creators MAY include any valid ascii whitespace [infra] in the authored tag so long as the result is valid to this definition.

But the infra definition allows form feed while xml's does not. If you put a form feed character in an xhtml document, it will cause the parser to halt. You can try and twist the definition that form feed is not a "valid" whitespace character, so it's excluded, but why even refer to a definition with invalid whitespace characters? The infra reference only works for html. Since we refer to xml's definition of normalization everywhere else in that section, we should be consistent and refer to xml's whitespace definition, too.

(The other concern I mentioned with package document values is probably fine as is. The two references to infra functions in that section - for leading and trailing whitespace removal and collapsing whitespace in a string - don't change the rules of whitespace authoring in xml. They just add an extra character for processing that isn't ever going to appear. Since xml passes through whitespace in elements, we need some definition for this handling.)

Put it another way, your proposal is to remove this sentence:

EPUB creators MAY include any valid ascii whitespace [infra] in the authored tag so long as the result is valid to this definition.

I agree. For better or for worse, we work with XHTML, meaning that the xml definition should prevail.

That being said, and although this being a corner case, maybe a short note right after this (edited) section is worth adding? Just to make this minor difference clear to authoring tools and EPUB creators? Otherwise, they may find themselves facing a difficult-to-find bug.

Just to make this minor difference clear to authoring tools and EPUB creators?

Do you mean to explain the difference between valid html and xhtml whitespace characters? Sounds reasonable, if so.

Just to make this minor difference clear to authoring tools and EPUB creators?

Do you mean to explain the difference between valid html and xhtml whitespace characters? Sounds reasonable, if so.

Exactly.