Omikhleia/markdown.sile

<br/> and <wbr/> unrecognized (inline HTML support)

ctrlcctrlv opened this issue · 6 comments

Commonly used in Markdown, though.

Thanks for your feedback! First, as you might know, there are two exiting ways for obtaining hard line breaks in Markdown syntax without depending on HTML-like constructs:

  • two trailing spaces at the end of a line = the standard Markdown but "invisible" way
  • A trailing backslash at the end of a line = corresponding to Pandoc's escaped_line_breaks extension, which I recently added to lunamark and therefore is now supported too here.

In terms of the underlying SILE construct, they are (currently) translated into a \cr. So there is actually a way to achieve the intended presentation - See also sile-typesetter/sile#416 where I mentioned that.

However, you are right: more generally we might eventually consider adding support for some of the HTML constructs commonly used in Markdown and therefore likely to be found in user content... As far as I need them too for book production, <hr class="..."/> also possibly comes to mind: while strictly non-HTML Markdown has horizontal rules, these cannot be styled. Real books do not use a long full-rule line, though, but rather various kinds of dinkus / pendants / ornaments....

This being said, it's a complex topic: The lunamark reader has to support HTML (which I think it may perhaps already do, but I did not check and try yet), but then SILE also needs to support parsing and rendering (a decent subset of) HTML too... For now, there's still a lot to do to ensure regular (Pandoc-extended) Markdown is well supported so that this package set can be worth a "1.0", so personally I am afraid I am unlikely to have a look at HTML soon.

Also, #7 might be a prerequisite before starting any work on this -- It's typically something we might want opt-in / opt-out.

Noted. I've started on it.

For the record, relates to Pandoc's raw_html (enabled by default in Pandoc, but it can be disabled). Lunamark doesn't condition it via an extension option (as it is the default in Markdown)

@ctrlcctrlv Thanks for the attempt at PR #18 - It was challenging and made me try and think about it. I created #29 with my findings - In brief here, while using htmlparser was indeed an interesting option (and thanks too for having brought it to my attention), it wouldn't work well... Because Pandoc and Lunamark don't exactly do what I would have (naively) thought.

So back on topic, let's indeed refocus on the initial scope of the issue. I edited the issue title: If we go for <br/>, we could as well support <wbr/> (from HTML5), which provides breakpoints where to line-break if needed. This could be a nice typographical addition (e.g. translating into a penalty in SILE). Both wouldn't need an HTML parsing library, just some clever string checks. Wanna try? Or I can do it, if you want/prefer (?). I kept that topic in the "milestone 1.1", which has no due date yet, but ideally within a few weeks (more or less).

Wanna try? Or I can do it, if you want/prefer (?). I kept that topic in the "milestone 1.1", which has no due date yet, but ideally within a few weeks (more or less).

I'm on it finally. Found another small bug I'll want fixed on pandocast, so I'll do that on the same occasion, and ensure both conversion route work.