jgm/pandoc

Command-line options --css and --include-in-header override corresponding metadata fields instead of accumulating

Oblomov opened this issue · 26 comments

Hello,

I just noticed that using the command-line -c (respectively, -H) option overrides any css: (respectively, header-includes:) keys in the YAML metadata sections.

This is a bit surprising, since the options themselves are otherwise cumulative (one can specify multiple -c and -H options, and they get all used), and makes it much harder to have both common and file-specific CSS and header includes (in this sense, this bug is related to #3115).

I believe that, to avoid violating the principle of least surprise, the -c and -H options should be cumulative with the file metatada (and specifically added after the internal ones). OTOH, such a change in behavior would break backwards compatibility, so maybe a way to specify additional CSS and header-includes from the command line should be given?

(If the replacing behavior is preserved, it would also be a nice idea to document it in the manual.)

ickc commented

There's some sort of discussion about this in #3138. @jgm mentioned it is simple to make the change, but since it breaks backward compatibility (some people might rely on this to override YAML's metadata), it requires a discussion in pandoc-discuss: Discussion needed—How should pandoc handle when meatada in YAML collide with command line option - Google Groups.

cagix commented

@jgm @ickc are there any news to this topic? the linked discussion appears to be out of date?

It would be very helpful to implement this merging behavior. If there are concerns about backward compatibility, may be a new option could enable the old behaviour?

+1 to making --include-in-header append to header-includes rather than overwriting it.

The current behavior is leading to a poor user experience in the wild. See manubot/rootstock#275: the pandoc-eqnos filter we use now prints the following stderr:

pandoc-eqnos: Wrote the following blocks to header-includes.  If you
use pandoc's --include-in-header option then you will need to manually
include these yourself.

    <!-- pandoc-eqnos: equation style -->
    <style>
      .eqnos { display: inline-block; position: relative; width: 100%; }
      .eqnos br { display: none; }
      .eqnos-number { position: absolute; right: 0em; top: 50%; line-height: 0; }
    </style>
jgm commented

See #5881. I think the changes I'm proposing there will also have the effect that include-in-header will append rather than overwriting.

jgm commented

The issue here is a subtle one: when pandoc populates variables for a template, metadata values are used only if the corresponding variable doesn't have a value set on the command line. And the effect of --include-in-header is to set the variable. That's why it overrides.

One solution in your filter would be to check the metadata. If there is a header-includes field already, then you can append your content to this instead of setting a variable.

metadata values are used only if the corresponding variable doesn't have a value set on the command line. And the effect of --include-in-header is to set the variable. That's why it overrides.

I see. So it is out of the question to change metadata values such that they are not discarded, but added to, when command line options are also specified? Currently --include-in-header is acting more like --replace-header-includes.

ec043e0 seems like it will give a way accumulate --include-in-header command line options with preexisting include-in-header fields in default files. Is the whole purpose of defaults to have an alternative to metadata where command line options don't discard existing values?

If there is a header-includes field already, then you can append your content to this instead of setting a variable.

My earlier example was in relation to @tomduck's pandoc-eqnos package, so tagging him in case this is useful. I think the issue is that the filter is unaware whether the user specifies --include-in-header and therefore is unable to know whether the header-includes metadata it sets will be discarded. If defaults can be accessed by a filter perhaps this would enable a way for the filter to add a --include-in-header to defaults as a workaround?

How does a defaults.include-in-header interact with metadata.header-includes?

Thanks for tagging me in, @dhimmel. Pandoc-eqnos does what @jgm suggests: it appends to the header-includes metadata if it exists. A warning is printed in the case that there is a risk of the header-includes being overridden. Instructions are given so that the user knows what to write into header-includes files. Cheers, Tom.

jgm commented

So it is out of the question to change metadata values such that they are not discarded, but added to, when command line options are also specified? Currently --include-in-header is acting more like --replace-header-includes

I wouldn't say "out of the question," but a big change like this could break behaviors that others are relying on, so I'm reluctant to do this without sufficient discussion and reflection.

Is the whole purpose of defaults to have an alternative to metadata where command line options don't discard existing values?

No, I wouldn't say it's the whole purpose. But one thing it is good for is getting rid of "abuse of metadata." Ideally, metadata should be metadata, not option settings and instructions for creating output. That belongs elsewhere. With default files, you'd be able to put this in a defaults file and focus on content (and genuine metadata) in your markdown file.

A warning is printed in the case that there is a risk of the header-includes being overridden. Instructions are given so that the user knows what to write into header-includes files.

This is not a great solution IMO because it shows requires always showing a warning, regardless of whether it's relevant (can confuse users as seen in tomduck/pandoc-tablenos#16). And if it is relevant, the user is required to duplicate content that belongs as part of the upstream filter.

Here's what a filter has access to when the --include-in-header command option is specified:

> echo " " | pandoc --to=json --include-in-header=README.md
{"blocks":[],"pandoc-api-version":[1,20],"meta":{}}

Here's what a filter has access to when the --include-in-header command option is specified as well as defaults.include-in-header. defaults/html.yaml contains something like:

include-in-header:
- build/themes/default.html
- build/plugins/anchors.html
> echo " " | pandoc --defaults defaults/html.yaml --to=json --include-in-header=README.md
{"blocks":[],"pandoc-api-version":[1,20],"meta":{}}
# note that if --to=json, --include-in-header and defaults.include-in-header accumulate

So the filter is in a tough position, because it cannot know whether --include-in-header or defaults.include-in-header have been specified. Some solutions:

  1. --include-in-header command line options and defaults get converted to metadata.header-includes prior to being passed to filters. They could still overwrite preexisting metadata.header-includes.
  2. --include-in-header options append to metadata.header-includes rather than replace it. This behavior seems most aligned with the option name.

Balancing metadata, defaults, and command line options is complicated!

+1 to have include-in-header append to header-includes instead of overwriting it.

Could we have a practical example of why the appendment-supporters would want to include CSS/Headers in both YAML metadata as well as on the commandline during invocation?

If you're using CSS/headers in the YAML metadata:

  • Currently, there are at least 4 ways to append:
    • append to the in-file YAML metadata
    • add a YAML metadata block as a separate (markdown-type) file in the input files
    • convert the YAML metadata to an external file and use the commandline argument (--css or --include-in-header)
    • use a defaults file (using either metadata: or (preferably) include-in-header:/css:, since the header/css aren't really "genuine metadata" as @jgm put it).
  • Currently, there is only one clear way to override: use the commandline argument to override YAML metadata or defaults file.

If the commandline argument also becomes equivalent to appending, then there is no clear way to override. This would be a loss of functionality.

So there seem to be reasons apart from backward compatibility for maintaining the current behaviour.

But confusingly, when it comes to other metadata, the behaviour is very inconsistent. I tried adding "author:" metadata in 5 places: the in-file pandoc-style metadata (using %, which can only be added in the very beginning of the file, i.e., before the YAML metadata), in-file YAML (after the pandoc-style metadata), -M author=name, in the defaults file using metadata:, and in a metadata.yaml file invoked using --metadata-file. What I found is this order of precedence where > indicates overwriting and = indicates appending:

--metadata = --defaults > in-file YAML > in-file % > --metadata-file

So while there is an argument to be made for allowing for overriding/overwriting, there is also a need to have some non-arbitrary rule as to the rules of precedence, and when and how they differ between different classes of things (like metadata vs. variables), especially when some things have their own flag and can be part of YAML metadata as well (--csl vs. csl:).

There are at least two other open issues linked to this: #3115 (comment) and #4057

My use case is to produce different versions of the document / in scripting contexts. Some headers are required by all versions of the document, so it makes sense to add them to the file itself, others are customizations that depend on the specific versions, which are overridden on the command line on a per-production case. Of course there are workable alternatives (like creating huge command lines with all the includes or running the document through a preprocessor like gpp), but it would be better to have this cooked in.

The manual currently states:

Note that, where command-line arguments may be repeated
(--metadata-file, --css, --include-in-header,
--include-before-body, --include-after-body, --variable,
--metadata, --syntax-definition), the values specified on
the command line will combine with values specified in the
defaults file, rather than replacing them.

(text introduced by this commit: ec043e0)

However, the example I've cited above shows that a metadata value set in --metadata-file does not1 "combine with the value specified in the defaults file", but is replaced by the value specified in the defaults file.

Footnotes

  1. This holds true at least in one case, since I only tested with one metadata value.

0az commented

Adding a +1.

I have a set of common macros for note-taking and other writing. For certain inputs, I want to use additional packages, including some that are relatively heavyweight. Ideally, I'd be able to add to header-includes in the file itself, while still sharing the common header includes file.

I could fork the template and modify it myself, but frankly, maintaining and updating a template is something that I'd rather not spend time doing, even if it's as simple as git pull --rebase upstream main.

I also have a Makefile, so I could just write the Python for a pre-pandoc step to generate a metadata file, or even hack something together (header-includes: %%% HEADER INCLUDE REPLACE TARGET %%%, replacing that in generated Tex, calling latexmk)... but again, this is a lot of complexity for what should be a relatively simple task.

jgm commented

However, the example I've cited above shows that a metadata value set in --metadata-file does not[^1] "combine with the value specified in the defaults file", but is replaced by the value specified in the defaults file.

I think you're misreading the manual. The manual says that the value given to the --metadata-file option will accumulate -- what it means is that if you say --metadata-file A --metadata-file B, you'll get metadata from both A and B. It says nothing about how the values specified in a metadata file will be treated.

My use case is to produce different versions of the document / in scripting contexts. Some headers are required by all versions of the document, so it makes sense to add them to the file itself, others are customizations that depend on the specific versions, which are overridden on the command line on a per-production case. Of course there are workable alternatives (like creating huge command lines with all the includes or running the document through a preprocessor like gpp), but it would be better to have this cooked in.

@Oblomov, could you satisfy this use case by simply having different defaults files (with, if need be, different metadata-files: as well)? I realize that this issue is from 2016, when defaults files didn't exist: perhaps they now address your need?

I see the metadata file as a different hack / workaround, but still not a “clean” solution. I could create different metadata files for the different use cases, which just shifts the problem around from humongous command lines to some kind of scripting to create the metadata file. The fundamental issue —i.e. that some options override each other with different priorities, and there is no way to specify they should be merged instead— remains unsolved, regardless of interface.

Weighing in with another +1. My use case is that I have a Lua filter used to render certain classes of div using the LaTeX packaage boxedminipage2e. I can add a suitable raw inline to the metadata header-includes, but when -V header-includes=... is set on the command line, the metadata is ignored. The solution proposed here, to combine metadata fields and variables when the variables can accumulate, seems workable to me.

If we were to implement #5221 (and #6731) in a way that would allows write-access to the writer options, then this could be solved with a filter. Not sure if that's a good idea though.

jgm commented

I think it starts to look like a bit of a mess if filters can not only alter the AST, but also change writer options. Then filters are two things, not one. So my first instinct is that this isn't a good idea.

I'm not sure what the right solution is, but I would urge that the solution ensures that there are clear ways both to accumulate metadata fields / defaults values as well as to override (perhaps with a special flag, if need be). Please don't remove an existing method of overriding values without providing an alternative.

+1 to making --include-in-header append to header-includes rather than overwriting it.

This issue is open for years, and I also agree with some commenters that it would be great to be able to combine metadata from header-includes with the command-line option --include-in-header. One situation where it would be highly beneficial, because workarounds are not handy, is for beamer slides.

If you need a command across the whole document, you cannot define it in the markdown body since it is surrounded with \begin{frame}/end{frame} and the scope of the command is that frame. So you need to define it in header. But if you are already including some standard header file (that you use for many sets of slides) using the command-line, you cannot put your new command in the header under header-includes. The only solution you get is to create a new file to be included from the command line. But creating a new file for one or two commands seems a waste of time...

+1 to making --include-in-header append to header-includes rather than overwriting it.

+1 to making --include-in-header append to header-includes rather than overwriting it.
#3139 (comment)
+1 to this suggestion to handle backwards compatibility.

bpj commented

It is pretty easy to modify your local (possibly local default) templates for any sets of includes which don’t interact with --include-in-header, for example in your latex template:

after ]{$documentclass$}

$for(before-header.latex)$
$before-header.latex$
$endfor$

and then just before $for(header-includes)$

$for(after-header.latex)$
$after-header.latex$
$endfor$

(See Templates in the Pandoc manual!)

and then in your metadata:

before-header:
  latex:
    - |
      ```{=latex}
      \usepackage{mypackage}
      \include{myinclude.ltx}
      ```
  html:
    - |
      ```{=html}
      <link rel="stylesheet" href="mystyle.css" />
      ```
after-header:
  html:
    - |
      ```{=html}
      <style>
      .red { color: red }
      </style>
      ```

Note that you still have to wrap the actual markup in raw blocks to avoid Markdown escaping! Or you could put it in a defaults file like this to avoid that:

(See Defaults files in the Pandoc manual!)

variables:
  before-header:
    latex:
      - |
        \usepackage{mypackage}
        \include{myinclude.ltx}
    html:
      - |
        <link rel="stylesheet" href="mystyle.css" />
  after-header:
    html:
      - |
        <style>
        .red { color: red }
        </style>

You can of course also have $for(before-body)$ etc. or organize it even more hierarchically however you want:

  includes:
    latex:
      header:
        before:
          # list of markup blocks
        after:
          # list of markup blocks
      body:
        before:
          # list of markup blocks
        after:
          # list of markup blocks
    html:
      # You get the idea

and then in your latex template

$for(includes.latex.header.before)$
$includes.latex.header.before$
$endfor$

etc.

Of course in your html template you could have

$for(link-css)$
<link rel="stylesheet" href="$link-css$" />
$endfor$
$if(css-style)$
<style>
$for(css-style)$
$css-style$
$endfor$
</style>
$endif$

and then in your defaults file

variables:
  link-css:
    - mystyle.css
  css-style:
    - |
      .red { color: red }

In other words you can include anything from metadata and variables you like however you like in your templates, and as long as you use other names than those variables which pandoc associates with its commandline options your custom fields will be totally independent. In my custom latex template I for example have:

$for(ltx-pkg)$
$if(ltx-pkg.opts)$
\usepackage[$--
$for(ltx-pkg.opts)$
  $ltx-pkg.opts$$sep$,
$endfor$
]{$ltx-pkg.name$}
$else$
\usepackage{$if(ltx-pkg.name)$$ltx-pkg.name$$else$$ltx-pkg$$endif$}
$endif$
$endfor$

which with this in a defaults file

variables:
  ltx-pkg:
    - foopack
    - name: ulem
      opts: normalem
    - name: mypack
      opts:
        - tic=tac
        - toc
        - tuc
    - name: barpack

gives this in my latex header

\usepackage{foopack}
\usepackage[
  normalem]{ulem}
\usepackage[
  tic=tac,
  toc,
  tuc]{mypack}
\usepackage{barpack}

Now I guess someone ought to make a PR for adding all those $for(includes.format.header.before)$ etc. to all the builtin templates, and I guess that could be me, but frankly its a lot of work because there are a lot of templates for a lot of formats, and I don’t even know what it would look like for all formats, or where it makes sense to place them for which formats. Maybe I could create a branch for this in my pandoc repository fork and ask people to contribute, but I guess that should include tests, and I don’t even know where to start with those!