pagebreak filter doesn't work with Commonmark
dmurdoch opened this issue · 7 comments
The pagebreak.lua
filter depends on the raw_tex
extension on the markdown
reader, but that extension is not supported by commonmark
or commonmark_x
. This results in \pagebreak
or \newpage
being written to the output file with the backslash escaped, so the macro is visible instead of being translated into a page break.
Example: working in the lua-filters/pagebreak
directory, this command
pandoc --from commonmark --to pdf sample.md -o sample.pdf --lua-filter pagebreak.lua
produces this output:
The solution is to look for the macros in the Para()
function of the filter. A complication is that commonmark+sourcepos
splits the macros into two parts and wraps them in Span
, the Para()
function needs to handle that case too.
You can make this work in CommonMark with
```{=latex}
\pagebreak
```
Requires the raw_attribute
extension which is enabled by default in commonmark_x
.
Sure, but my thinking went as follows:
In favour of the change:
- there are a lot of existing documents using the simpler syntax, and they'll all be broken if Pandoc transitions to CommonMark without this change. It was one of the first issues I saw when I tried to use the
sourcepos
extension in R Markdown documents. - Markdown is supposed to be readable, and it's more readable than the fenced solution.
Against the change:
- It doesn't fit the CommonMark design very well, which is the reason the
raw_tex
extension is incompatible with thecommonmark
reader. The spec says "Backslashes before other characters are treated as literal backslashes".
But CommonMark doesn't provide a way to enter a page break, so it needs to be some kind of extension, and this seems like a fairly harmless one. People who really want paragraphs containing nothing but \newpage or \pagebreak should just avoid using the filter.
I think my preferred solution here would be to create a new filter that converts the special paragraphs into LaTeX, e.g.,
function Para (p)
if is_pagebreak(p) then
return pandoc.RawBlock('latex', pandoc.utils.stringify(p))
end
end
Users would run the filter before pagebreak.lua
.
There are two reasons for that:
- It's cleaner.
- Making the filter act on Para elements has a significant performance impact; most users should not have to pay that.
I'd be more open to adding support for special div's, so commonmark_x users could write
::: pagebreak
:::
or
{.pagebreak}
---
For plain CommonMark, an HTML-based syntax could be acceptable:
<hr class="pagebreak"/>
The existing filter already works on Para elements, it looks for a single FF character there. The proposed test makes the test more complicated and so it will be slower, but is it really enough of a difference to be noticeable? (In the context where I'm using it I think the answer is almost certainly no: I run knitr, then Pandoc, then pdflatex. The Pandoc step is almost always very quick compared to the others.)
You're right. I forgot about that. I'm still hesitant to add this kind of special case here.
Regarding your proposed syntax choices: I think the one using :::
is the most readable, so it's the one I'd choose if new syntax is needed. But the back-compatibiity of \pagebreak
(and its familiarity to people who know LaTeX) are still positives for it.
I've moved the code for the pagebreak filter to pandoc-ext/pagebreak. The code has been updated to be more configurable; it would now be easier to implement the suggested changes without the mentioned drawbacks. PRs welcome.
Closing this here.