jgm/pandoc

Template snippet won't be resolved using `includes` such as `-H`, `-B`, `-A`

ickc opened this issue · 18 comments

ickc commented

I just realized I've probably made a mistake. I opened an issue in jgm/pandoc-templates#220. Probably I was working with templates and was in that repository so I opened there. But just now when I think about it, it is not an issue about any of the templates over there! The issue is about how pandoc handles templates. So it should probably be put here. I'm sorry about that. Here's the original post over there. There's a few discussions there already. And I am closing the issue over there. Sorry again!

  1. I found that if I wrote a template fragment and include it by -H/--include-in-header=, the $...$ will be left as is and not processed.
  2. I also found that although pandoc would look for the --template=<FILE> from the user data directory first (e.g. ~/.pandoc/template/), but if -H <FILE> is used, pandoc won't look for it from the user data directory.

I tried to search for issue related to these, but there aren't any. So it doesn't seem for the others to think it is an issue. But at least it seems it won't be harmful if the said 2 behaviors are implemented.

The use for the above behavior is to have fragments of custom templates. Rather than managing 1 modified, complete template from the official one, I can put my custom code in separate files. This approach has the advantages of:

  1. Do not need to manually make pull request from the official template to the custom ones. If the default template is left untouched, the template "version" will sync with the pandoc version when it is updated. This would be more painful if one has more than 1 custom templates for the same format.
  2. more modular: all the benefits of "static" include (that don't process the $...$) will benefits custom template writing as well.
  3. custom templates usually (or is it true?) are slightly modified from the official one. e.g. the LaTeX output expects a lot of packages and it would be very difficult to write a totally new template from the ground up. If all we do is to take the default template and add a few lines, which often the time do not interact with the original code, than using include is easier to write those templates.

An example of include-template is in ickc/pandoc-templates at include-in-header.

To make an analogy, in Jekyll, there's liquid templating. The "liquid" will be processed whether or not it is in includes or layouts (or even the "main document" but that shouldn't apply here).

To conclude, the proposal is

  1. If a file is included (say by -H/--include-in-header=), it would be included in the template before it is being processed (those $...$).
  2. Whenever include is used, it will also search for it in data-dir. We could use the same folder as the template does: templates. Or if we want to keep it separate, perhaps a folder includes.

And to make it clear, although I mentioned include-in-header, --include-before/after-body= should be treated equally.

cagix commented

I remember some earlier discussion on this topic, but cannot find the threads ... For some reasons the -H includes are not processed like a template and one would have to use a double run of pandoc, as I outlined in https://github.com/jgm/pandoc-templates/issues/220#issuecomment-250668371 ...

@jgm Can you remember the reasons for this?

It certainly would make things easier, if pandoc would include the -H files before processing the $...$ in the template.

jgm commented

In general it wouldn't be desirable to recursively resolve
the templates. Say you're including a LaTeX snippet; it
might contain some math using $, and this would produce
a syntax error in the template processor (or at the very
least, undesirable results).

+++ Carsten Gips [Oct 03 16 02:21 ]:

I remember some earlier discussion on this topic, but cannot find the
threads ... For some reasons the -H includes are not processed like a
template and one would have to use a double run of pandoc, as I
outlined in [1]jgm/pandoc-templates#220 (comment) ...

[2]@jgm Can you remember the reasons for this?

It certainly would make things easier, if pandoc would include the -H
files before processing the $...$ in the template.


You are receiving this because you were mentioned.
Reply to this email directly, [3]view it on GitHub, or [4]mute the
thread.

References

  1. https://github.com/jgm/pandoc-templates/issues/220#issuecomment-250668371
  2. https://github.com/jgm
  3. #3138 (comment)
  4. https://github.com/notifications/unsubscribe-auth/AAAL5JwnpmklYYMtTFqLv2leYtLVAn3Xks5qwMkBgaJpZM4KL7g3
ickc commented

Will you consider giving an option to control this behavior?

To borrow an analogy, in Jekyll, if and only if a file starts with a YAML front matter, it will be processed including the liquid template language.

So if we don't want to introduce more command line option, we can do it exactly the same way. Since a YAML front matter will not be valid (be it html, tex, etc.), it differentiates the file I want to be processed as template or plain.

ickc commented

To give more details on the use case behind the issue, an example is ickc/pandoc-amsthm. That "package" is not just a filter, but a filter comes with templates. Both are required for it to work. (Basically it defines the amsthm through YAML front matter. Then through the templates, it auto-generates the necessary header/preamble to define the amsthm in HTML/TeX. Then the filter transform pandoc native div into LaTeX environment.)

What I did is I includes templates for TeX and HTML output, but then that overly restricted its use since people might need their only custom template. I then created a fragment that include the codes related to amsthm only, and said it could be used to create a custom template. But that's a manual process.

If I want to mention a way to do it automatically, currently I need to say something like:

MD=$(<INPUT.md)
echo "$MD" | pandoc --include-in-header=<(echo "$MD" | pandoc --template=pandoc-amsthm.EXT) -o "OUTPUT.EXT" # where EXT is tex/html

While it is a nice trick, it seems very daunting. (What I wanted to do is to provide a very straightforward way to deal with amsthm in pandoc. So the last thing I want is to make it looks daunting.)

jgm commented

If you include Text.Pandoc.Templates in your filter,
you can use its functions to resolve your custom
template, and then insert the result into header-includes.

ickc commented

If you include Text.Pandoc.Templates in your filter,
you can use its functions to resolve your custom
template, and then insert the result into header-includes.

Thanks for the info. Seems like it should be something I use. I didn't see it used in the example from jgm/pandocfilters, so I didn't know it before.

But the point is still the same though. For example, there's another template snippet I personally used like this:

$if(ucharclasses)$
  \usepackage[Latin$if(ucharclassesgreek)$,Greek$endif$$if(ucharclasseshebrew)$,Hebrew$endif$]{ucharclasses}
  \usepackage{xltxtra,xunicode}
  \usepackage{unicode-math}
    \newcommand{\latinfont}{\renewcommand\rmdefault{lmr}\renewcommand\sfdefault{lmss}\renewcommand\ttdefault{lmtt}\defaultfontfeatures[\rmfamily,\sffamily]{Ligatures=TeX}}
  \setTransitionsForLatin{\latinfont}{}
$if(ucharclassesgreek)$
  \newfontfamily\greekfont{$ucharclassesgreekfont$}
  \setTransitionsForGreek{\greekfont}{}
$endif$
$if(ucharclasseshebrew)$
  \newfontfamily\hebrewfont{$ucharclasseshebrewfont$}
  \setTransitionsForGreek{\hebrewfont\setRTL}{\setLTR}
  \usepackage{bidi}
$endif$
$else$
$endif$
---
ucharclasses:   true
ucharclassesgreek:  True
ucharclassesgreekfont:  Cardo
ucharclasseshebrew: True
ucharclasseshebrewfont: Cardo
...

This doesn't involve a filter, so if I want to distribute this snippet but not a whole template, the "round trip" trick of using pandoc twice is needed.

The point is if there's an option to recursively resolve the template into the includes, then we can have a lot of "pandoc-extras" not in the form of filters but template snippet too.

ickc commented

@jgm

If you include Text.Pandoc.Templates in your filter,
you can use its functions to resolve your custom
template, and then insert the result into header-includes.

Does that requires Haskell? Or is it supported in pandocfilters?

ickc commented

I encounter a related issue:

If in command line, pandoc ... -H <file> ... is used,

and in the md file's yaml:

header-includes:
    - \usepackage{siunitx}

The YAML's header-includes will be overridden.

I know the manual specifically said that the command line option will override YAML metadata. But on the other hand, since head-include is a repeatable argument, it seems to be within expectation for pandoc to combine both (YAML and command line's header-includes).

This is kind of related to this issue, in the sense that it's an effort to shatter header-includes/templates into snippets.

But on the other hand, the need of sanitizing codes might be behind the idea. And of course if the behavior is changed, backward compatibility would be an issue.

Are there currently ways to have header-include in both command line and YAML?

jgm commented

Note this:

% echo "" | pandoc -t native -s -M foo=1 -M foo=2
Pandoc (Meta {unMeta = fromList [("foo",MetaList [MetaString "2",MetaString "1"])]}) []

If you specify foo multiple times on the command line, you
get a list. It seems to me that it would be reasonable
for things to work the same way if foo appears in both
document metadata and command line metadata. I don't think
this would be too difficult a change.

However, it's possible that it would break some workflows.
Some authors may be relying on the ability to selectively
override elements of metadata on the command line. So I'd
be reluctant to make this change -- at least not without
a full discussion on pandoc-discuss.

+++ ickc [Oct 13 16 20:17 ]:

I encounter a related issue:

If in command line, pandoc ... -H ... is used,

and in the md file's yaml:

header-includes:

  • \usepackage{siunitx}

    The YAML's header-includes will be overridden.

    I know the manual specifically said that the command line option will
    override YAML metadata. But on the other hand, since head-include is a
    repeatable argument, it seems to be within expectation for pandoc to
    combine both (YAML and command line's header-includes).

    This is kind of related to this issue, in the sense that it's an effort
    to shatter header-includes/templates into snippets.

    But on the other hand, the need of sanitizing codes might be behind the
    idea. And of course if the behavior is changed, backward compatibility
    would be an issue.

    Are there currently ways to have header-include in both command line
    and YAML?


    You are receiving this because you were mentioned.
    Reply to this email directly, [1]view it on GitHub, or [2]mute the
    thread.

References

  1. #3138 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5EizI2gqoKxJWPm-PU6MMQuwewS9ks5qzvQxgaJpZM4KL7g3
ickc commented

at least not without
a full discussion on pandoc-discuss

I will open a thread there discussing this.

Do you also want to discuss about the recursive template behavior, or is it settled and won't be changed?

ickc commented

I opened a pandoc-discuss about the repeated metadata across YAML and command-line in Discussion needed—How should pandoc handle when meatada in YAML collide with command line option - Google Groups.

Regarding recursive resolving templates, I'm thinking about if it is possible to have a command line option similar to --file-scope that triggers an alternative build process. At the very least, this build process will use the trick above to resolve the template snippets before being included.

xdbr commented

Only this week I stumbled over exactly this behavior while using @ickc's amsthm template+filter: if a template-snippet is included using -H and carries variables/metadata from the yaml-header, these varialbes are not interpolated.

I fully agree with @icks rationale that straightforward template-snippets would alleviate the need for every user to build custom templates which then need constant merging with the upstream customized template. What if I want to use multiple customizations from multiple places? This will quickly get out of hand..

This finally got me thinking whether a --plugin option might be a good idea? Something along the lines of, e.g.:

  • if --plugin=ams is specified and
  • if a metadata top-level block ams: exists
    • a template ams.latex or ams.html, etc is searched for, interpolated, and taken into account
    • as well as the corresponding filter ams.py or ams.hs, etc.

Of course, this is only a rough sketch, but it might make it super-simple for (multiple) extensions to be used independently.

Clearly, it would need a little more differentiation, namely on where to include the snippet(s), so one idea might be to be able to have the snippets named after their position of inclusion:

  • ams-header.{latex,html,etc.}
  • ams-before.{latex,html,etc.}
  • ams-after.{latex,html,etc.}

Lastly, if plugins like these existed, we might eventually come up with something like a "package repository" for different modules/customizations/plugins, who knows...

BTW: I didn't think of @ickc's round-trip "hack", pretty nifty!

ickc commented

Hi, @xdbr, this partly is my bad, that didn't include all the necessary thing in the filter, but took a filter+templates approach (but it has an advantage of allowing one to tweak the template-snippet to their liking), see ickc/pandoc-amsthm#12.

But I still agree there should be options to recursively resolve into template-snippet. Some simple things are very simple that only requires a template snippet, and the round-trip hack is too convoluted (@cagix told me this trick first).

A possible way to make everyone happy and backward compatible is to have a command line option --template-depth=NUMBER that control the depth that it will resolve into, and the default number is 0.

xdbr commented

Hey @ickc: what do you actually think about the --plugin-outline? I think it would be an easy to approach individual customization-path. Should we consider/discuss this under a separate ticket, or what would be the best place to do so?

ickc commented

Hi, @xdbr,

None of the followings are finalized yet:

  1. I'm planning to rewrite the amsthm (probably using panflute) so that it doesn't depends on external template snippets (and also extends its functionality). See that issues in ickc/pandoc-amsthm.
  2. We have a plan to create a way to arbitrarily specifying panflute filters in YAML. It could applies to other pandoc filters as well.

Again, nothing promised, but they might happen. Stay tuned (in pandoc-discuss).

If these are done, it would solves part of the problem above. For example, rather than giving pandoc an arg --plugin or so, you can specify the filters in the YAML of the document following a certain convention (yet to be developed).

However, after all these, I think there's still a need to use template snippets (none of these directly addressed that), because something are too simple to be done via template snippet than writing a filter. I think the simplest way will be giving pandoc a new arg --template-depth=NUMBER default to 1, making it backward compatible.

ickc commented

So I'd
be reluctant to make this change -- at least not without
a full discussion on pandoc-discuss.

There aren't much discussion in Discussion needed—How should pandoc handle when meatada in YAML collide with command line option - Google Groups, may be because of a bad title.

Do you think this should be changed? Allowing header-include not be overridden by -H certainly would make header-include a more "reliable" use, which can also be handy in filter writing.

ickc commented

Just some updates:

@xdbr

The few things I said before:

@jgm

For the 2 related problems:

  1. recursively resolved into template-snippet
  2. includes like header-include in YAML metadata overridden by cli options

The first one is an old one (long before this issue is opened), and has known workaround as discussed above. So if it isn't going to be changed, I'm ok with it and we can close this issue.

The second one is really a different issue, discussions here and in pandoc-discuss does not resulted in anything conclusive. And the main problem is there's no known simple workaround over this (except to preprocess the source markdown?). So if it is something you think it should be changed, I can open another issue about it.

jgm commented

Closing in favor of #3139 for issue (2).