Non-breakable space shall be justifiable
Omikhleia opened this issue · 0 comments
Unicode Line Breaking Algorithm:
... then expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width.
The "nbsp" is therefore intended to be stretchable.
On the parsers' side of things.
- Regarding pandocast, Pandoc expands
 
(or\
with appropriate/default extension options) to U+00A0 in the Str AST node. - Regarding markdown, Lunamark expands
 
to U+00A0 passed to the writer.string() method. - Regarding djot,
\
invokes a dedicated nbsp AST node (which we expanded to U+00A0 but see below).
On SILE's side, it doesn't seem the typesetter / shaper / whatever involved1 considers U+00A0 as shrinkable/stretchable for the purpose of justification... Bah! Regardless, we can filter them in our various inputters, and make the necessary adjustment.
In Djot, moreover, since any content can have attributes, we may accept, e.g. \ {.fixed}
for a fixed inter-word space, if need be.
Footnotes
-
IMHO, the SIL and XML inputter should actually be responsible for that, not the typesetters/shapers, etc. = i.e. it's perhaps best regarded as something that should be done at AST level. But heh, the SIL inputter doesn't even split paragraphs, currently, it's done by the typesetter for a part (see
typesetter.parseppattern
setting, &c). Perhaps a debatable separation of concerns issue... ↩