Improve Text (content) handling
Closed this issue · 1 comments
Description
Text content handling can be improved. The main 2 rules for controlling text is the preserveText
and forceIndent
markup rules. Both behave as intended but some re-thinking and consideration needs to be had in this area, especially with preserveText
My intention here is to normalize text nodes when preserveText
is disabled (ie: false
) and produce output that is identical to that of which is the default behaviour of rendering engines (like the browser). Changes will include:
- Stripping newlines
- Stripping extraneous whitespaces
Example
Take the following code sample:
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Sapien eget mi proin sed libero enim.
Turpis egestas sed tempus urna et pharetra pharetra massa.
</p>
The new logic to be introduced will (by default) result in the following:
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Sapien eget mi proin sed libero enim. Turpis egestas sed tempus urna et pharetra pharetra massa.
</p>
Essentially, all newlines are removed the same way the browser would render such code.
Logic to consider
This new behaviour needs consideration when it comes to newline preservation on the global preserveLine
rule. The data structures generated in the lexing process do a good enough job with identifying content
types so there might be room or some thought should be had about potentially introducing a new beautification rule within markup
that controls whether newline preservation is be respected within text content, it could be something like preserveTextLines
.
The forceIndent
rule despite being related to content formats can carry on behaving as it should.
Solved and shipped in v0.3.0.beta – Might need some minor refinements in the future, but is operational now. Will open a new issue when working through test cases for more context.