pagebreak filter doesn't work with Commonmark

Question

pagebreak filter doesn't work with Commonmark

dmurdoch opened this issue 2 years ago · 7 comments

The pagebreak.lua filter depends on the raw_tex extension on the markdown reader, but that extension is not supported by commonmark or commonmark_x. This results in \pagebreak or \newpage being written to the output file with the backslash escaped, so the macro is visible instead of being translated into a page break.

Example: working in the lua-filters/pagebreak directory, this command

  pandoc --from commonmark  --to pdf sample.md -o sample.pdf --lua-filter pagebreak.lua

produces this output:

The solution is to look for the macros in the Para() function of the filter. A complication is that commonmark+sourcepos splits the macros into two parts and wraps them in Span, the Para() function needs to handle that case too.

Answer 1 · 2022-11-29T19:57:06.000Z

You can make this work in CommonMark with

```{=latex}
\pagebreak
```

Requires the raw_attribute extension which is enabled by default in commonmark_x.

Answer 2 · 2022-11-29T20:24:27.000Z

Sure, but my thinking went as follows:

In favour of the change:

there are a lot of existing documents using the simpler syntax, and they'll all be broken if Pandoc transitions to CommonMark without this change. It was one of the first issues I saw when I tried to use the sourcepos extension in R Markdown documents.
Markdown is supposed to be readable, and it's more readable than the fenced solution.

Against the change:

It doesn't fit the CommonMark design very well, which is the reason the raw_tex extension is incompatible with the commonmark reader. The spec says "Backslashes before other characters are treated as literal backslashes".

But CommonMark doesn't provide a way to enter a page break, so it needs to be some kind of extension, and this seems like a fairly harmless one. People who really want paragraphs containing nothing but \newpage or \pagebreak should just avoid using the filter.

Answer 3 · 2022-12-01T09:19:12.000Z

I think my preferred solution here would be to create a new filter that converts the special paragraphs into LaTeX, e.g.,

function Para (p)
  if is_pagebreak(p) then
    return pandoc.RawBlock('latex', pandoc.utils.stringify(p))
  end
end

Users would run the filter before pagebreak.lua.

There are two reasons for that:

It's cleaner.
Making the filter act on Para elements has a significant performance impact; most users should not have to pay that.

I'd be more open to adding support for special div's, so commonmark_x users could write

::: pagebreak
:::

or

{.pagebreak}
---

For plain CommonMark, an HTML-based syntax could be acceptable:

<hr class="pagebreak"/>

Answer 4 · 2022-12-01T09:43:31.000Z

The existing filter already works on Para elements, it looks for a single FF character there. The proposed test makes the test more complicated and so it will be slower, but is it really enough of a difference to be noticeable? (In the context where I'm using it I think the answer is almost certainly no: I run knitr, then Pandoc, then pdflatex. The Pandoc step is almost always very quick compared to the others.)

Answer 5 · 2022-12-01T09:50:50.000Z

You're right. I forgot about that. I'm still hesitant to add this kind of special case here.

Answer 6 · 2022-12-01T09:51:32.000Z

Regarding your proposed syntax choices: I think the one using ::: is the most readable, so it's the one I'd choose if new syntax is needed. But the back-compatibiity of \pagebreak (and its familiarity to people who know LaTeX) are still positives for it.

Answer 7 · 2023-04-14T14:55:40.000Z

I've moved the code for the pagebreak filter to pandoc-ext/pagebreak. The code has been updated to be more configurable; it would now be easier to implement the suggested changes without the mentioned drawbacks. PRs welcome.

Closing this here.