jgm/pandoc

Add caption option for code blocks

miekg opened this issue · 19 comments

It would be nice to have some way to specify a caption for code blocks in the style of the table caption.

codeline1
codeline2
codeline3

Code: caption explaning the code above

Looking at what (for instance) docbook expects, a better way is to have a figure wrap a code block, i.e. like the following non-working code:

![This is the caption](

A verbatim code block                                                                                          
jkasjksajassjasjsajsajkas                                                                                      

)

If you using fenced code blocks it is possible to add attributes to the code block:

~~~{#test .haskell caption="asdf fdsa"}
asdf
~~~

If you compile to latex this gets correct shown in the output (at least if you compile with --listings flag).

\begin{lstlisting}[language=Haskell, caption=asdf asf]
asdf
\end{lstlisting}

If you compile to html the caption="asdf fdsa" is added as an attribute to the <pre> block but this isn't visible in the document.

<pre class="sourceCode haskell" id="test" caption="asdf">
asdf
</pre>

But this is more like a hack and not a solution.

jgm commented

You could add a little javascript to make these captions appear in HTML; perhaps it can also be done with pure CSS?

Good catch! Yes you can even do it with css:

pre:after {
  content: attr(caption);
  font-weight:bold;
}
<pre caption="ASDF">
asdf
</pre>

But another thing: caption attribute is no valid html. What do you think of prefixing all attributes with data- to use html5 data attributes? Maybe with a whitelist of valid attributes?

Sorry to revisit this. My main "complaint" isn't so much that it is impossible to use a caption, but that the caption technique differs so much from the one in use for tables.

I have the same issue - the use of caption causes epub verification to fail.

jgm commented

EPUB contents must be valid xhtml. Try using "data-caption" instead
of "caption." (At least, this should work in EPUB3, I don't know
about EPUB2.)

+++ James Turnbull [Jul 17 14 08:04 ]:

I have the same issue - the use of caption causes epub verification to
fail.


Reply to this email directly or [1]view it on GitHub.

References

  1. #673 (comment)

If I specify data-caption inside a fenced code block I get:

00-frontsmatter.tex:297: Package xkeyval Error: data-caption' undefined in familieslst'.
00-frontsmatter.tex:297: leading text: ...ge=bash , data-caption=Sample code block]

When I try to build to PDF. I am trying to build to PDF and ePub for the project. I guess I can sed it before I build the ePub.

I like the proposal for the markup syntax. For the rendering, I think (i) either the HTML writer should special-case check for this attribute, and and convert it to data-caption. Maybe the EPUB and other writers too. Or (ii) more complexly, all those writers should check a whitelist and prefix all non-recognized attributes with data-, as @nougad suggested. Or, least preferred, (iii) it should be stored internally as data-caption (though perhaps the reader will accept caption= and silently convert it for you), and it'd be the LaTeX writer that does the special checking and converts it from data-caption (back) to caption.

Images and Tables can have captions as well and captions can contain markup (in contrast to the suggested solution with attributes). I'd prefer the common syntax of tables captions:

    echo "Hello World"!

: Example of a *Hello World* program

My current workaround is to filter code blocks followed by paragraphs that start with : and convert to a 1x1 table (with caption) that contains the code block:

<table>
<caption>Example of a <em>Hello World</em> program</caption>
<tbody>
<tr class="odd">
<td align="left">
<pre>
echo "Hello World"!
</pre>
</td>
</tr>
</table>

As far as I tried is it not possible to create such 1x1 multiline table with inner code block in Markdown syntax.

Can we agree that extending table caption syntax to code blocks is probably best idea so far?

For LaTeX output, we can probably leverage float package by defining new float environment for code blocks with captions (if not using listings). For HTML-based output, adding a caption is not an issue. For other formats, it would differ greatly, but in theory doable.

Thoughts?

From the HTML specification:

The figure element represents some flow content, optionally with a caption, that is self-contained (like a complete sentence) and is typically referenced as a single unit from the main flow of the document.

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc.

Can we have

$ pandoc -t html5 <<EOF
~~~{.python caption="Your first program"}
print("Hello.")
~~~
EOF
<figure>
<pre class="sourceCode python"><code class="sourceCode python"><span class="dt">print</span>(<span class="st">&quot;Hello.&quot;</span>)</code></pre>
<figcaption>Your first program</figcaption>
<figure>

Note: this approach will be more similar with

$ pandoc -t html5 <<EOF
![foo](bar.png)
EOF
<figure>
<img src="bar.png" alt="foo" /><figcaption>foo</figcaption>
</figure>

I think it should be as universal as possible—I just encountered the problem needing to caption a mathematical formula (display math, as in $$…$$). Would be great if this worked the same style as for tables, images, code …

@lierdakil: Using float seems like a good idea, this could easily work for the others, too! (Like for my formula)

I just encountered the problem needing to caption a mathematical formula
(display math, as in $$…$$).

Did you try LaTeX \tag{} command?

Caption for mathematical formulas works in a very different way that
caption for figures, tables and codes. For example:

  +--------------------+
  | Equation goes here |  (Caption goes here)
  +--------------------+

instead of

  +------------------+
  | Figure goes here |
  +------------------+

    Caption goes here

and

   Caption goes here

  +-----------------+
  | Table goes here |
  +-----------------+

Moreover,
figures, tables and codes have a strong standard to complain
but math does not, i.e. some people are using MathML when others are using
MathJax and there still the case of people using images for mathematical
formulas.

Ever any updates on this?

mb21 commented

@aparcar You can use something like the syntax proposed by @rgaiacs, then write a pandoc filter...

Or, you know, there's support for listings in pandoc-crossref. Which may or may not suit your particular use-case.

I just proposed a unified way to define <caption> and <figcaption>, for images, media tags (<audio> and <video>), and even <pre> element #9261.