jgm/pandoc

Explicit Figure element in Block

jgm opened this issue Β· 65 comments

jgm commented

Currently we represent figures in the AST using this hack: a figure is an Image whose title attribute starts with fig: and which is by itself in a Para.

Short of a full-featured figure environment in the AST, it would make sense to move to a less hacky representation: a Div with class figure containing the image (which need not have a title starting with fig:). This would involve changes to readers and writers.

Indeed, if we did this, we could support figures containing multiple images, via explicit Divs.

jgm commented

Another advantage is that attributes could be added explicitly to the Div.

<div class="figure floatRight">
![my image](img.jpg){.imageclass}
</div>

See #3094.

Seconded, I just spent an hour figuring out why multiple images in a paragraph don't create a figure. Is there any way to create a figure with multiple images right now? (I guess creating Rawblocks will work?)

Relevant code is here? https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HTML.hs#L450 Why is the match only for one image and not multiple ones in the first place?

jgm commented

If multiple images were allowed, which one would form the figure's caption? (There is only one caption.) What would determine how the images are arrayed in the figure? In retrospect, some kind of more explicit syntax for figures would have been desirable, and maybe that's the direction we should move in.

Layout: Hm, I only use html and latex backends, but both of those handle multiple images in a figure in a reasonable way without extra specification. However, in latex IIRC it makes a difference if there is a SoftBreak between images (break vs. no break). IMHO it would be up to the writer/backend to break up figures if multiple images are not supported.

Caption: None unless explicitely stated? That's legal in both html and latex iirc. However I believe Paras do not support extra data like attrs, so that a div or figure node would be easier.

jgm commented

+++ Hauke Rehfeld [Oct 24 16 16:42 ]:

Seconded, I just spent an hour figuring out why multiple images in a
paragraph don't create a figure. Is there any way to create a figure
right now (I guess creating Rawblocks will work?)

You can paste the images together into one image, I suppose,
or use a filter.

Relevant code is here?
[1]https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HT
ML.hs#L450 Why is the match only for one image and not multiple ones in
the first place?

Good question. I suppose that since I'm in a field where
pictorial figures aren't used much, I didn't realize how
common figures with multiple images and a single caption
are.

But how would it work to allow a paragraph with multiple
images (and nothing else) to form a figure? From which
image would the figure's caption be taken? What determines
how the images are arranged in the figure (side by side,
in a square, etc.)?

Really we need a more explicit syntax for figures if this
kind of thing is going to be allowed.

jgm commented

I think we need an explicit Block element for Figure.

mb21 commented

The question is whether this figure element should only contain images, or if it should be a general floating-container more analogous to the LaTeX \begin{figure} and HTML5 figure elements (emphasis added):

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

If so, the figure element should contains a caption (multiple paragraphs allowed) and arbitrary block content:

Figure Attr [Block] [Block]
jgm commented
jgm commented

Development on figures branch for pandoc, pandoc-types.

jgm commented

Thinking about about explicit Markdown syntaxes for figures.

If we had a native syntax for divs, we could treat any div with a following caption as a figure:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;---
^^  [This is the optional short caption.]
    This is the long caption. It can span multiple blocks. 
    Syntax like footnotes.  

    Subsequent paragraphs indented.  We automatically treat a
    div as a figure if it is followed by a caption.  Or is it
    too confusing if the caption comes outside the div?

Another possibility would be to have a special kind of marking for figures, like:

!--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)

[This is the optional short caption.]
This is the long caption. It can span multiple blocks. 
Syntax like footnotes.  

In this version, all the Para elements at the end of the
structure are treated as the caption,
so we don't need an explicit syntax to mark the caption.
!---

I'm trying to avoid syntaxes that require you to use the word figure.

I like the ^^ syntax for attaching captions; we might want to use this for tables as well (INS: and code blocks) if we use it for this.

jgm commented

TODO on figures branch on pandoc and pandoc-citeproc

  • Finish updating writers to handle Figure
  • Update readers
    • RST
    • Markdown
    • LaTeX
    • HTML
    • Org/Blocks
    • MediaWiki
  • Get everything to compile
  • Update tests
  • Update ToJSON, FromJSON instance in pandoc-types to include Figure, Caption
  • Update Arbitary in pandoc-types (it lacks Div, Figure, Span, probably others)
  • Markdown syntax, new extension?
  • Figure numbering and internal refs?
jgm commented

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats. E.g. in docbook, only certain elements are allowed inside a figure element. http://tdg.docbook.org/tdg/4.5/figure.html

Perhaps instead the contents should be limited to a list of images (or perhaps a list of lists of images, so they can be organized on lines? -- though it may be better to let the layout happen automatically, given width information).

Perhaps listings could go in figures as well?

jgm commented

I think I'm going to remove this from the 2.0 milestone as it still needs more thought.

mb21 commented

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats.

I still think taking the most general approach in the AST makes sense. There are always going to be some formats that don't support certain things, but that should be handled by the respective writers and the AST design shouldn't be held up by those. It would be great to have a general block figure element to output to HTML/ePUB/LaTeX...

mb21 commented

Concerning the caption syntax, I kind of prefer the second one, since it is clearly placed inside the figure/div element. A third variant:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;--
This is the long caption. It can span multiple blocks. 

Syntax like footnotes.  
;--
This is the optional short caption.
Since it's optional, it needs to go at the end in this syntax.
;---
jgm commented

What kinds of things do people really put in figures, besides images?

mb21 commented

Maybe the element I have in mind is more of a Float than a Figure.

Again the MDN extract posted above:

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

And from Wikibooks LaTeX/Floats:

Floats are containers for things in a document that cannot be broken over a page. LaTeX by default recognizes "table" and "figure" floats, but you can define new ones of your own (see Custom floats below). Floats are there to deal with the problem of the object that won't fit on the present page, and to help when you really don't want the object here just now.

Floats are not part of the normal stream of text, but separate entities, positioned in a part of the page to themselves (top, middle, bottom, left, right, or wherever the designer specifies). They always have a caption describing them and they are always numbered so they can be referred to from elsewhere in the text.

Usually it's tables and images that are floated, but it could also be source code, a poem, some sort of aside box etc. Even Docbook has a sidebar element.

Maybe the table AST element shouldn't have a caption, only the Figure element should have a caption. Current markdown table syntax with captions would be converted to Figure attr caption [Table a]. With the attr specifying whether the figure should float or be at that fixed position in the text, plus whether to list it in the list of figures/list of tables etc.

Summarizing, a Figure is an element that:

  • usually has a caption/title
  • can be listed in "List of figures" or similar TOC-like entity
  • can be referenced from other parts of the text (see #813), and
  • may or may not float (which is actually a layout decision).

I think it would be great to have the figure type in the AST for pandoc 2.0. Writing the code for the writers and reference generators etc. can be done later.

jgm commented
mb21 commented

I'm leaning currently towards the second option (general float/caption container). Use cases include floating more than just images (e.g. float two tables that share a caption), or having one figure with a caption, that contains subfigures (or images) with each having a caption, e.g:

It's probably true that it gets a bit trickier to consider all cases in all writers, but it is a more flexible option.

That would certainly be "clean" w.r.t. LaTeX/figures/subfigures, but how would that nesting be expressed in, say, pandoc markdown?

For a thinking exercise, a first attempt at extending the syntax musing by @jgm above, for a figure with 3 subfigures (i.e. from @mb21's example):

!--- {#foo .right}
!--- {#a .left}
![seagull](gull.jpg){.imageclass}
[A gull]
A grey gull of the genus Rattus Avian.

Many people don't like gulls, but they're a pretty impressive bird for a rat.
!---

!--- {#b .mid}
![tiger](tiger.jpg){.imageclass}
[A tiger]
A sumatran tiger
!---

!--- {#c .right}
![mouse](mouse.jpg){.imageclass}
[A mouse]
A tiny, little squeaky toy for cats
!---

[Pictures of Animals]
Pictures of Animals.  These are commonly recognised examples of Earth species.
!---

In this case, the figure caption text is after the content definition for consistency with the subfigures.

It feels workable, albeit a little clunky.

I’d like to loop in #4737 to this discussion and indicate that, for accessibility purposes, it is necessary whatever implementation is decided upon allows for the alt tag to be separate from the caption, which is currently not possible.

mb21 commented

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats.

I still think taking the most general approach in the AST makes sense. There are always going to be some formats that don't support certain things, but that should be handled by the respective writers and the AST design shouldn't be held up by those. It would be great to have a general block figure element to output to HTML/ePUB/LaTeX...

I could have a look at the figures branch and see whether I can finish the work.

One thing I noticed, that the writers match on:

Figure _attr (Caption _short long) [Para [Image _imgattr alt (src,tit)]

which, I'd expect the HTML writer to render to <figure><p><img ...

Shouldn't the canonical way of a figure with image(s) be:

Figure _attr (Caption _short long) [Plain [Image _imgattr alt (src,tit)]
jgm commented

One possibility (which wouldn't require a new Figure element in the AST) would be to adopt some implicit conventions for treating divs as figures. (This could be attached to an extension like div_figures or figure_divs.) It would be nice to avoid having to use an English-language label like figure, so the thought is to let the structure of the div show that it's a figure. A div would be treated as a figure if it starts with a paragraph containing one or more images (and nothing else). The remainder of the div would then be treated as the main caption.

::: {#animals}
![A gull.](gull.jpg "A gull.")
![A sumatran tiger.](tiger.jpg "A tiger."){width=50%}
![A mouse.](mouse.jpg "A mouse.")

Pictures of Animals.  These are commonly recognised examples of Earth species.

Here's a second paragraph.
:::

For formats that allow subcaptions, the image alt text could be used as before. Or perhaps we could use the title for this instead (which would leave alt text for its normal intended purpose). The only difficulty with using the title is that it doesn't allow markdown formatting (and especially math), but perhaps we could reparse the string contents of the title.

An alternative, more explicit approach to subcaptions would be to allow a list of images:

::: {#animals}
- ![A gull.](gull.jpg)
- ![A tiger.](tiger.jpg)
- ![A mouse.](mouse.jpg)

Pictures of Animals.  These are commonly recognised examples of Earth species.

Here's a second paragraph.
:::

I like the more explicit approach because it allows you to have multiple images without subcaptions, which is sometimes desirable. A variant of this approach would be to take the subcaptions from paragraph(s) following the image in the list items.

::: {#animals}
- ![A gull.](gull.jpg)

  A gull.

- ![A tiger.](tiger.jpg)

  A sumatran tiger.

- ![A mouse.](mouse.jpg)

  A mouse.

Pictures of Animals.  These are commonly recognised examples of Earth species.

Here's a second paragraph.
:::

We could also adopt the convention that the first sentence of the caption is used as a "short caption" (used in table of contents, etc.) (#2978, #4409). Or, we could just look for an optional shortcaption attribute on the div. Or we could interpret a span at the beginning of the caption as the short caption, as in @mikecee's example above.

A similar approach could be used to support captioned code listings (#673).

::: {#mycode}
``` haskell
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
```

Lazily generated Fibonacci sequence in Haskell.
:::

And we could generalize the same idea to tables. A table outside a div would be captionless. To get a table with a caption, put it in a div with the caption following:

::: {#mytab}
  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Demonstration of simple table syntax.

Look, ma, a two-paragraph caption!
:::

Implementation. There are several options:

  1. Just have the readers parse figures into this kind of div, and leave it to the writers to recognize these conventions and render the divs appropriately.

  2. Have the markdown reader recognize this convention and produce a div with a special class (figure?), which would then be recognized by the writers.

  3. Introduce a new AST block element, Figure, and have the readers produce this.

mb21 commented

While I agree that the figure syntax should probably resemble the native div syntax, I think we should be careful to avoid mixing the discussion about the AST design on one hand, and markdown syntax on the other hand.

AST

What the AST should be able to represent is a kind of figure with:

  • [Block] content
  • [Block] caption
  • [Inline] short caption (for \listoffigures)
  • figure should be referenceable (i.e. have an attribute with id)

What differentiates the above from the div, that we already have, is the captions. While we could certainly use a div and certain conventions, I'm not convinced this would be much better long-term than what we currently have for a figure: an image in a paragraph and certain conventions.

Subfigures

Arguably, the conceptually simplest way to get proper subfigures is to simply nest figure elements, just like it's recommended in HTML (they specifically caution against abusing the title attribute for subfigure captions). Also, this...

The only difficulty with using the title is that it doesn't allow markdown formatting (and especially math), but perhaps we could reparse the string contents of the title.

...sounds like another hack that would make things especially hard for filters-authors (and writers).

At first read, I liked the idea of using a list-like syntax for the subfigures. But the more I think about it, the stranger it seems to use both a div and a list, since a list itself is already a container with [Block] content. It seems like something starting with:

::: {#animals}
- ![](gull.jpg)

should rather render to:

<figure id="animals">
  <ul>
    <li><img src="gull.jpg">

Again, my mental model for this are the HTML <figure> and LaTeX \begin{figure} concepts (which I expect a lot of people to be familiar with). Both can contain arbitrary block content (plus a caption).

Markdown syntax

As I mentioned, I think we should settle on an AST definition first, then look at a markdown syntax. (Syntax discussions tend to be fairly subjective and opiniated...) But since we're already here, echoing the first few proposals in this thread:

:::
![](gull.jpg)
^ A gull
:::

The idea is to have some specific marker that separates the contents from the caption. (In a way, similar to the > for lazy blockquotes.) As opposed to divs, the class shouldn't be required following the :::. Thus it would probably be easier to parse, if the caption marker were above the content (in the following variant, using an underscore as the marker):

:::
_ A gull
![](gull.jpg)
:::

Nested figures and captions:

::::: {#animals}
:::
![](gull.jpg)
^ A gull
:::

:::
![](tiger.jpg){width=50%}
^ [A short caption for the tiger]
  A long caption for the tiger subfigure
:::

:::
![](mouse.jpg)
^ A mouse
:::

^ Pictures of Animals. These are commonly recognised examples of
  Earth species.

  Here's a second paragraph.
:::::
jgm commented

I think I'm persuaded that it makes more sense to use nested figure syntax for the "subfigure" case. (Btw, how is the figure with subfigures in the thread above represented in LaTeX? Is it nested figure environments?)

I don't see any particular advantage in having a special syntax for the caption (your ^). I'd much rather just have the convention that the images go first, and the rest is the caption. If we want a short caption, we could have the convention that if the caption begins with a paragraph containing a single span, it's treated as the short caption. This looks very clean. Another advantage of this (like the current implicit_figures syntax) is that it would degrade nicely in processors that don't know about the figure syntax.

While we could certainly use a div and certain conventions, I'm not convinced this would be much better long-term than what we currently have for a figure: an image in a paragraph and certain conventions.

Let's remind ourselves the putative drawbacks of the current implicit_figures syntax:

  1. can't have multiple images in a figure
  2. can't have block-level content in captions
  3. no way to specify a short caption
  4. no way to just have a paragraph with an inline image, without tricks like adding an empty HTML comment.
  5. no way to have a figure containing block-level content other than an image.
  6. [ADDED LATER:] no proper "alt text" for the captioned images, in cases where the caption does not function as alt text

The div convention I proposed, or a variant of it, would solve 1, 2, and 3. It would have an analogue of problem 4, namely that if you wanted to have a regular div (not a figure) that happened to start with one or more images in a paragraph by themselves, and continue with some block-level text, you'd need to use a special trick to prevent it being interpreted as a figure (e.g. add a nonbreaking space or HTML comment to the paragraph with the images). But how common is that going to be? (If we were worried about it, we could also allow people to specify a .nofigure class.)

It would also have problem 5 -- but is this really a problem? I'm not completely convinced of that. What would go in a figure besides images? Besides, as I note above, some formats (like docbook) restrict what can go in a figure. To be sure, there are other kinds of floats that are needed: tables and code listings, for example. But conceptually these are distinct from figures (e.g. in a book they generally have different labels and numbering sequences). As I note above, the implicit div approach could accommodate all of these. The type of float produced would be determined by whether the div begins with images, a code listing, or a table.

As opposed to divs, the class shouldn't be required following the :::.

The div convention syntax I proposed would require something here. I think that's okay, because we should be strongly encouraging the use of explicit identifiers with figures; otherwise there's no way to refer back to them.

Finally, turning to the AST question. I agree that this is conceptually separate from the syntax question. And I can see arguments on both sides. But I do see some advantage in not introducing new AST elements. (It's really a pain to do it, and it breaks backwards compatibility.) Note that many formats don't really support figures; in these formats, we'd get a natural fallback for free if we just represented these things with Divs in the AST.

Here's an updated version of my proposal for the nested figure case:

:::: {#animals}
::: {#gull}
![A gull.](gull.jpg)

A gull.
:::
::: {#tiger}
![A tiger.](tiger.jpg)

A sumatran tiger.
:::
::: {#mouse}
![A mouse.](mouse.jpg)

A mouse.
:::

[Pictures of animals]

Pictures of Animals.  These are commonly recognised examples of Earth species.

More details about animals in this second paragraph.
:::::
mb21 commented

how is the figure with subfigures in the thread above represented in LaTeX? Is it nested figure environments?

Ha, good question! It's from wikibooks and apparently using \begin{subfigure} from the subcaption package...

As opposed to divs, the class shouldn't be required following the :::.

The div convention syntax I proposed would require something here. I think that's okay, because we should be strongly encouraging the use of explicit identifiers with figures; otherwise there's no way to refer back to them.

Well yes, but in current usage of image figures, surely, most of them don't refer back to the image? It's just nice that the figure is there. Or are you proposing to keep the existing implicit_figure extension in place and activated by default, as well?

Probably you're right that multiple images, code listing and tables account for over 95% of things that people would want to put into floats, and the rest could maybe be done using a filter. The examples of the wrapped code listing and table look good, but there the .nofigure class would certainly be needed.

Maybe those cases kind of sum up my hesitancy best: it's just quite hard to figure out when exactly a div is going to be a figure, and where exactly the content ends and the caption starts, and I image people having trouble remembering all those rules. I'll have to think about it some more...

jgm commented

Ha, good question! It's from wikibooks and apparently using \begin{subfigure} from the subcaption package...

In that case, the idea of using a list rather than repeating the figure syntax recursively might make sense after all.

As for identifiers: there's currently no very good way to generate a reference back to a figure. But this could change if the figure as a whole (as opposed to the image) had an identifier, as it would with the div proposal.

I'm not sure whether implicit_figure should remain on by default. I'd be tempted to disable it by default, though this might be too much of a breaking change. If it were enabled, we'd have to make sure that it isn't turning images inside figure divs into figures, recursively, but this shouldn't be hard.

it's just quite hard to figure out when exactly a div is going to be a figure, and where exactly the content ends and the caption starts

I thought for a while that we could use a horizontal rule to separate the content from the caption, but then I convinced myself that it looks better without a separator, and a separator isn't really needed.

mb21 commented

In that case, the idea of using a list rather than repeating the figure syntax recursively might make sense after all.

There is still the case for HTML: generating it more easily, and being consistent with it from an authoring perspective. And for subfigures, apparently we'll have to output special markup for LaTeX anyway.

About requiring identifiers: I was trying to say that if the implicit_figure extension would be disabled, you would have to give the figure an identifier, even if the most common use case would continue to be not to refer to that identifier.

I think it would be in favour of using the horizontal rule as a separator. It'd kind of serve the same purpose as the ^ in my comment above, e.g:

::: {}
![](gull.jpg)

---
A gull
:::

And if pandoc would be made to be more commonmark-compliant, it wouldn't even need the blank line when using e.g. triple underscores instead of triple dashes.

mb21 commented

Maybe we can also get some input from @mfenner on how the floats worked out for scholarlymarkdown. For example, do people actually use Textbox: and put a block of text in a float?

mb21 commented

Looking over this related thread about referencing tables and figures, I have two additions:

  1. We probably should have figures that contain equations. To reference equations, and add captions.

  2. Taking the idea from that thread about having different counters, like fig:, table:, eq:, etc., we could do something like:

    ::: {}
    ![](gull.jpg)
    
    Figure: A long gull caption
    
    With another paragraph. [Short caption.]
    :::
    

    So not hardcoding the English Figure string, but requiring a word without spaces, followed by a colon, followed by the caption. This should help the human reader understand where the caption starts, and might help with auto-generating an id, like #figure:short-capt.

    Maybe a shorthand for a figure with only one image and a one-line caption (i.e. the current case), could simply be the following, as the only thing inside a paragraph:

    ![](gull.jpg)
    Figure: A long gull caption
    
jgm commented

It's fairly rare to have a captioned equation. But numbered equations are common. Currently pandoc doesn't support these very well -- you have to use example lists. Anyway, I think numbered equations are a different topic. The most convenient way to support them, eventually, might be to add support for \label inside tex math. I suppose a captioned equation could just be a figure, in the regular figure numbering sequence. But allowing this would require some more explicit way of marking figures. I still like the implicit approach where we just look at the first thing in the div. I'm not sure the rare case of captioned equations is important enough to support.

With the implicit approach, you don't need special labels; the counters would be determined by the content (listing if it's a code block; figure if it's images). But your idea of using labels is an interesting one. One hesitation is this: if people use fifteen different labels, we'd need fifteen numbering sequences, and we'd need to override a lot of the automatic LaTeX figure machinery, leading to unidiomatic LaTeX.

On the other hand, we have to generate the label (e.g. "Figure") and number (e.g. "1.1") manually in formats other than LaTeX (and maybe a few others). So a case could be made for always generating them manually, even if the result in LaTeX isn't idiomatic. Taking the label from the source itself would remove the need for localization of these labels.

Another thing I realized: it may make sense to encode the numbering in whatever AST element we use for these figures. On this approach, the numbers would be generated in parsing (and could be sensitive to things like LaTeX commands setting the number style). After parsing, we could walk the tree and replace references with (linked) numbers. (If we left attributes indicating the reference target, these could be rendered as \ref{label} in LaTeX, ignoring the number, but the other formats would benefit.)

jgm commented

Btw, I'm not wild about autogenerating ids for these. I'd rather require people to specify ids manually. And this would be consistent with current Div syntax, which does require attributes.

I think you have most of the points down. The only thing I can add at the moment is:

  • I'm not sure what pandoc's philosophy is here usually, but only supporting the common denominator between all formats makes pandoc a much less attractive product to convert between formats that you specifically choose for their feature set. If I'm choosing latex and html as the formats, I would definitely expect pandoc to be able to convert figure elements between the two.
  • I'm frequently using pandoc's ast. An implicit figure is much harder to work with than an Explicit. It's the difference between node.classname is Figure and a bunch of nested node has child, child has caption, caption begins with "figure:", etc.

One more thought: maybe if would be nice to have some formal way of generating documentation on what actually gets lost in conversion between formats. Not sure if this is possible with code reflection and without a lot of manual work, but if you could do e.g.: pandoc conversion-loss -t latex -o odt and it would display a list of lossage in the conversion from latex to odt, it would make it much more transparent.

mb21 commented

It's fairly rare to have a captioned equation. But numbered equations are common.

Agreed, but If you google "equation caption", you'll see a lot of people asking for it, including a SO question with 40 upvotes (and google image search for some examples). Admittedly, if you google something you'll inevitably find it, so it may still be fringe use-case, but it does exist.

I'm not wild about autogenerating ids for these. I'd rather require people to specify ids manually.

I agree that's usually for the best, but eventually people will want to generate a toc-like list of figures in HTML, with links to the figures. So eventually someone will probably write a filter to autogenerate the ids of those figures that aren't referenced in the text. Just something to keep in mind...

if people use fifteen different labels, we'd need fifteen numbering sequences, and we'd need to override a lot of the automatic LaTeX figure machinery, leading to unidiomatic LaTeX.

My takeaway from the discussion in #813 was, that generating unidiomatic LaTeX is unfortunately not a good choice, because people need to submit it to journals etc. So I think we're stuck with having to do both: generate idiomatic LaTeX, and reimplement an equivalent logic for HTML etc. However, the fig: prefix in LaTeX is just a convention as well, so maybe the people that need idiomatic LaTeX should just stick to those conventions, even in their markdown.


Anyway, the big question: Should we have figures that act as generic floats for all sorts of content, or should we restrict readers to fallback to divs with class figure, or even strip figures that don't contain 'canonical' content? While I like the idea of keeping things as simple as possible (e.g. in terms of permutations of possible inputs to consider), restricting the markdown input doesn't really solve the problem, since we still have to consider other inputs. For example, what should this output?

pandoc -f html -t markdown
^D
<figure>
  <figcaption>This is a figure of figures.</figcaption>

  <figure>
    <p><span class="math display">\[a^2 + b^2 = c^2\]</span></p>
    <figcaption>Pythagoras' theorem</figcaption>
  </figure>

  <figure>
    <pre><code>
    def hypotenuse_length(a, b):
      return math.sqrt(a*a + b*b)
    </code></pre>
    <figcaption>Python implementation</figcaption>
  </figure>
</figure>

Finally, about the AST question, I guess you're best qualified to decide, since you already started implementing this on the figures branch. I would imagine, that it's easier to pattern-match on Figure attr capt blk instead of Div attr ((Para (Image i)):capt). But yes, it would break backwards-compatibility. But maybe we could works towards pandoc 2.5, along with PageBreak and the new Table elements? I'm happy to contribute code once the design is decided.

mb21 commented

@hrehfeld Thanks for your input!

jgm commented
mb21 commented

Yet another syntax variant:

::: {}
![](gull.jpg)

> A gull
:::
  • Using the contents of the last blockquote in a div as the caption.
  • Can include arbitrary block content in both content and caption.
  • Obvious syntax and fallback for other markdown parsers, even though a blockquote is of course not semantically comparable to a caption.

For the short caption, the quote should end in a span (or maybe don't require the span-attribute and maybe require it to be in its own paragraph?):

::: {}
![](gull.jpg)

> A gull with a very long caption
>
> [A gull]
:::

A div would be treated as a figure if it starts with a paragraph containing one or more images (and nothing else).

For my 2 cents, I'm not at all a fan of this idea, tbh. In my experience, explicit is almost always better than implicit, and it raises a question of "how do I make a proper div with an image that's not interpreted as a figure". Some sort of extension (kinda like implicit_figures) might work, e.g. impicit_figure_divs, so that "figure divs" would require an explicit figure class without it enabled, but that sounds very similar to what we're trying to avoid here, no? My vote here is for non-ambiguous syntax.

As for subfigures, I kinda like my approach taken in pandoc-crossref. Subfigures are a list of Para of Image (with optional SoftBreak or Space inbetween). So, f.ex.,

:::{#fig:subfigure}
![a](image-in-row-1-1.png)
![b](image-in-row-1-2.png)
![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)
![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

will be rendered in 3x2 grid kinda like this:

a b c
d e f

Figure caption

Line breaks and spaces are optional, this would be equivalent:

:::{#fig:subfigure}
![a](image-in-row-1-1.png) ![b](image-in-row-1-2.png)![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

Note, however, that this syntax kinda only makes sense if figures are explicit in Markdown syntax, with none of that "if a Div contains X, then it's a figure" business. Otherwise, too much overlap for my taste.

If you go implicit figure divs route, I'd like to at least have an explicit marker for figure captions, that is, I'm not really a fan of using the first/last Para that's not an Image. Perhaps taking a page from current table caption syntax might work, requiring captions to have <word>: in the first paragraph (where <word> is any sequence of characters, including no characters, without spaces/breaks)? I.e.

:::
![](some-image.png)

: Caption
:::

Otherwise, it'd be very tricky to do something like

:::{.warning}
![](screenshot-snippet.png)

If you see something like this on your screen, your computer is about to explode, RUN!
:::
jgm commented

For my 2 cents, I'm not at all a fan of this idea, tbh. In my experience, explicit is almost always better than implicit, and it raises a question of "how do I make a proper div with an image that's not interpreted as a figure". Some sort of extension (kinda like implicit_figures) might work, e.g. impicit_figure_divs, so that "figure divs" would require an explicit figure class without it enabled, but that sounds very similar to what we're trying to avoid here, no? My vote here is for non-ambiguous syntax.

In pandoc we've tried to avoid using English language words to mark things, which tells against using an
explicit figure class. (A document in Chinese should not be strewn with latin characters.) My guess was that divs that start with a paragraph containing just images but aren't meant to big figures are rare enough that the implicit approach would work (together with a way to defeat it in rare cases, e.g. a nofigure class). This would be much like the current implicit_figures extension, which hasn't been a big problem from that point of view (paragraphs just containing an image being pretty rare).

:::{.warning}
![](screenshot-snippet.png)

If you see something like this on your screen, your computer is about to explode, RUN!
:::

I'll concede that this is a good counterexample to my claim!

As for subfigures, I kinda like my approach taken in pandoc-crossref. Subfigures are a list of Para of Image (with optional SoftBreak or Space inbetween). So, f.ex.,

![a](image-in-row-1-1.png)
![b](image-in-row-1-2.png)
![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)
![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

will be rendered in 3x2 grid kinda like this:

I like the grid idea and propose we adopt it in whatever we end up with. (Do you actually use a tabular to render this, or just let the images adjoin each other on the line?)

I'm not really a fan of using the first/last Para that's not an Image.

That's not exactly the proposal. Everything after the images part would be the caption. It can contain arbitrary block-level content, on this proposal. That may actually be a bit too liberal. For example, do we want to allow other captioned elements, like tables, in a caption? Probably not. But I can imagine lots of people who will want to have (non-captioned) tabular content inside a caption. One of the motivations for this issue is expanding what can go in a figure caption.

As I see it, our options are:

  1. Go with the full implicit div approach, with some way of manually disabling it (nofigure class).

  2. Require a more explicit marking of the div, using some English (or other) language terms. This could be fairly unobtrusive, as with your #fig:, but this would still jump out in a Chinese text.

  3. Try to be cleverer about the implicit div approach, putting more restrictions on what the container looks like. For example, as @mb21 suggests, we could require that it start with images and end with a blockquote, which would be the caption. Or as I mentioned earlier, we could require an hrule to separate the image part and the caption.

mb21 commented

If we use the dash in # my unnumbered title {-} as a precedent for "here comes an English word that shall not be spoken", we could do:

::: -
something
:::

Now, the question is: should - stand for .nofigure or .figure :S

Do you actually use a tabular to render this, or just let the images adjoin each other on the line?

Both, actually. There are switches (metadata variables) that control whether it's rendered as a table, or just adjoined images with line breaks. And it's a whole another story with LaTeX (long story short: subfloat from subfig package, and a mix of RawBlocks and RawInlines; it might be a good idea to use subcaption package instead)

It [caption] can contain arbitrary block-level content, on this proposal.

This... doesn't really make much sense to me, tbh. I struggle coming up with a reasonable example of a figure/float caption that would even consist of more than one paragraph, let alone contain tables or other block-level elements. But okay, maybe I'm just hung up on LaTeX limitations.

As I see it, our options are ...

Out of these three, I'm kinda leaning towards the last option, "blockquoted" caption in particular, since it looks the least ambiguous. That said, actual use might be somewhat cumbersome -- there would be lots of extraneous > which don't actually have any semantic meaning and are just there as a quirk of the syntax (which sounds rather bad if I put it this way I guess). It wouldn't be that bad for reasonably short captions though (1-2 paragraphs), and as I said, I can't imagine a reasonable use-case for something much longer than that.

If we use the dash in # my unnumbered title {-} as a precedent for "here comes an English word that shall not be spoken", we could do

Only I believe that should be

::: {-}
something
:::

for the sake of consistency?

Now, the question is: should - stand for .nofigure or .figure :S

My intuition would suggest that {-} disables something, so I would expect - standing for .nofigure. Maybe that's just me though.

jgm commented

It [caption] can contain arbitrary block-level content, on this proposal.
This... doesn't really make much sense to me, tbh. I struggle coming up with a reasonable example of a figure/float caption that would even consist of more than one paragraph, let alone contain tables or other block-level elements. But okay, maybe I'm just hung up on LaTeX limitations.

See #4229 for one request for multiparagraph captions. As far as I can see, there's no LaTeX limitation preventing this (except that if you have multiple paragraphs, you must specify the optional short caption argument to \caption). Certainly it might be sensible to have a list in a caption, or a block quote. See also #1024 for a proposal for block-level content in table captions. If this posed a problem in multiple output formats, that could be a reason for disallowing it. But it seems possible in LaTeX, HTML, ...

It's definitely worth making sure there's a strong reason for multiparagraph captions before we take that step.

there would be lots of extraneous > which don't actually have any semantic meaning and are just there as a quirk of the syntax (which sounds rather bad if I put it this way I guess).

Yes, I have the same reservation. The hrule proposal might be nicer, actually:

::: {#myfig}
![the image](img.jpg)

____
caption goes here
:::

I'm not wild about the {-} idea. {-} is already used for unnumbered headers. And it's pretty cryptic what it's supposed to mean.

mb21 commented

[block captions are] possible in LaTeX, HTML, ...

That's enough of a reason to make the AST so, I think.

About the hrule vs >: I really think both are fine. I agree that semantically, the hrule makes more sense. But for the most common use-case of one-paragraph captions, the > is less to type and takes up one line less.

But if we keep the current implicit_figures extension around for the most common use-case, then we can err on the side of making things more explicit for more complex figures: then I would be totally fine with requiring the attribute with id, and using one more line for the hrule.

But if we keep the current implicit_figures extension

Please do, because backwards compatibility. Rewriting all those documents using a new figure syntax would be a huge pain.


A side question: since as far as I understand the intention is to introduce a new AST element for figures specifically, why are we hung up on re-using existing syntax elements to define a new one? I mean, can't we invent a syntax for figures (or, more generally, "floats" in LaTeX terms) specifically?

mb21 commented

can't we invent a syntax for figures (or, more generally, "floats" in LaTeX terms) specifically?

We certainly could, but considering how long it took to agree on a native div syntax, the reasoning was to use something that resembles that, so that people don't have to learn yet another completely new syntax.

jgm commented

It's not decided to introduce a new AST element for
figures.

Okay then, the AST change label on this issue confused me.

The syntax issue is orthogonal to this.

Well, certainly. Obviously, any syntax could be within reason parsed as any AST element, that's pretty much the meaning of A in AST. There's a catch to that though, in my experience, it's good to have at least some correspondence between AST and the syntax, to keep parser complexity reasonably low and yourself reasonably sane ^_^

While pondering this, I came up with a bit of a middle ground between reusing div syntax and coming up with a new one, that might be agreeable. So here's a quick proposal:

We'd, generally, like to avoid ambiguity between figure divs and regular divs. Besides heuristics, classes seem like an obvious (and explicit) choice, but using english is discouraged for i18n reasons. But we don't really have to use alphanumerics for classes now, do we? So how about "figure divs" requiring ! class? (! because Markdown image syntax uses ! to differentiate from links -- seems like an obvious choice).

Case 1:
:::{#regulardiv}
![](picture.png)
---
This is a regular div, despite starting with an image and containing <hr>
:::

Case 2:
:::!
![](picture.png)
---
This is a figure div without an id (or possibly with automatic id?)
:::

Case 3:
:::{.! #figureId}
![](picture.png)
---
This is a figure div with an id
:::

Case 4:
:::! {#figureId}
![](picture.png)
---
This syntax doesn't really work as of Pandoc 2.2.1, but seems
like an obvious extension of case 2
:::

IIRC, CSS doesn't really understand non-alphanumeric classes anyway, so ! class by itself wouldn't be too meaningful in most output formats, consequently it doesn't seem to shadow anything in terms of functionality.

This syntax is also ambivalent wrt AST representation -- if down the line AST is changed to include explicit "float" elements, it's distinct enough as to not be considered "stealing" syntax from divs.

jgm commented

Actually, Pandoc allows ! class. Well, kind of:

$ pandoc --version
pandoc 2.2.1
Compiled with pandoc-types 1.17.5.1, texmath 0.11.0.1, skylighting 0.7.2
$ echo -e ':::!\ntest\n:::\n' | pandoc -t native
[Div ("",["!"],[])
 [Para [Str "test"]]]

It doesn't parse in the attribute list though (that is, inside {})

Has the thinking evolved about this issue?

Since a new Table block element has been included in the AST, there is an argument that a specific Figure type also belongs there.

As the author of pandoc-plot, I've gotten questions about subfigures (e.g. LaurentRDC/pandoc-plot#4). This is not really possible right now, but would become feasible with a Figure Attr Caption [[Inline]] element. Using a Div [Block] for figures doesn't allow for e.g. 2x2 subfigures.

I can understand if a Div is preferred for backwards-compatibility. If this is the decision, then I would like to move it forward.

mb21 commented

Yes, I think we still want to do the Figure element.. but still not decided what's the best way...

Great. Having thought about it a bit more, the fundamental difference for me between a full figure (Γ  la LaTeX) vs. the current simple pandoc figures is the support for subfigures in two dimensions, with each subfigure having the possibility to have their own captions. Something like this:

data Subfigure = Subfigure Attr Caption Inline

data Block = ... 
           | Figure Attr Caption [[Subfigure]]
mb21 commented

Figure will probably contain a [Block]... which may or may not contain a Figure again.... you might want to read the issue starting from around here...

Captions

Markdown and HTML (and MediaWiki etc.) support six heading levels, plain Latex supports up to seven named levels (\part, \chapter, \section, \subsection, \subsubsection, \paragraph, \subparagraph). Most authors reserve the top level for single use at the start of the document, i.e. as title (although HTML and Latex have different dedicated markup for them), if they use it at all. Most styles guides tell writers to avoid more than three heading levels, but in technical documents deeply nested hierarchies do occur. The general syntax could support deeper levels as well: just repeat the prefix (and optional postfix) character # more often.

My point is, captions could use the lowest heading level already available

###### Caption

![text](target) 

or another level could be introduced for them, systematically:

####### Caption

![text](target) 

Contents

In modern forum, blog and chat software and social websites, plain links are often automatically converted to informative β€œcards” by fetching metadata like title, author and cover image. Links to media files, audio and video recordings in particular, are also displayed with embedded playback controls. These can hardly be distinguished, conceptually, from traditional (floating) figures.

I therefore suggest that implicit figures shall support any number and combination of links ([foo](bar), [foo][baz], [foo][], [foo], <bar>) and embedded media (![foo](bar), ![foo][baz], ![foo][], ![foo]) as long as they are the only contents of a paragraph. In practice, authors will often put each one in a line of its own, but this, probably, cannot be relied upon.

A single, complex figure:

![text](target)
![text](target) 

Another single, complex figure:

![text](target)![text](target) 

Two simple figures:

![text](target)

![text](target) 

@Crissov Pandoc Markdown does not have a limit on header level. Additionally, many other formats don't either, and we want Pandoc Markdown to be (at least somewhat) interoperable with those. So that's a hard blocker to your proposal.

It would be great to see support for figures as essentially wrappers that associate a caption with some content.

Both HTML and JATS XML allow a fairly wide range of content inside their <figure>/<fig> elements, including media, diagrams and equations, and it would be extremely useful if this was preserved when converting between formats.

Just to collect some thoughts from reading the thread above and looking through some of the supported formats:

There are two sorts of figures. One type is a floating captioned container, which would most easily have this type in Pandoc:

-- Caption from Table could be removed.
data Block
  = ...
  | Figure Attr Caption CaptionPos FigureWidth [Block]
  ...

-- A Figure with [Table...] or [CodeBlock...] content could be a
-- captioned table or listing (for numbering or in output, if there
-- are separate captions for those elements).
-- Not sure how the Figure and Table Attrs would be handed in 
-- HTML output in that case. Just use the Figure's? Or merge.

-- A Figure with [Plain [Image...]] content (or a Figure with 
-- a sequence of those figures as content) could be a 
-- gallery-type figure (the second kind).

-- Caption position is frequently customizable
data CaptionPos = CaptionBelow | CaptionAbove

-- The figure width is necessary for subfigures in many formats.
-- Handling it like Table columns (fractions of the enclosing
-- container width) should work.
data FigureWidth = FigureWidth Double | FigureWidthDefault

Some support for this type of figure, that I know of:

  • HTML5 has a <figure> with any sort of content, including nested tables and figures.
  • LaTeX has the figure environment for figures that can have a lot of block-like content in it, but not figures or tables. There is the popular subcaption package that allows for subfigures and subtables, but isn't designed for further nesting. (It seems subfigures can be nested, since they're implemented with minipage, but the caption numbering doesn't work properly). That package may also not work with some journals' templates, from what I've read.
  • DocBook has a <figure> element that allows a lot of block-like content, but forbids figures, tables, equations, and examples in it. DocBook 5.2 seems to have a <formalgroup> element for grouping figures (allowing for one level of figure/table nesting level like LaTeX).
  • JATS seems to support <fig-group> and <fig> elements that work sort of like the corresponding elements in DocBook 5.2, according to the archiving tag set.
  • ConTeXt has floating environments with optional captions, I think. Not sure how well they work when nested.

The other type of figure is like an image gallery: either a single captioned image, or a sequence of captioned images that can be collectively captioned. (There are also grid versions, but the sequence version seems more popular). This has the type:

-- Not sure if FigureWidth is desirable for Figure itself 
-- in this version (Table doesn't have it).
data Block
  = ...
  | Figure Attr Caption CaptionPos FigureWidth Figures
  ...

-- The [Inline] is for alt text like Image has, but this could be Text
-- instead, since the caption can be put into Caption.
data Figures
  = OneFigure Attr [Inline] Target
  | Gallery [Subfigure]

-- Same remarks on [Inline] vs Text for the alt text apply here
data Subfigure = Subfigure Attr Caption CaptionPos FigureWidth [Inline] Target

Some support for this type of figure:

  • Any of the formats that support the first kind of figure.
  • LaTeX also has the older subfig package that can handle this kind of gallery figure, which has the virtue of being more compatible with some journal templates, again from what I've read.
  • ConTeXt has combinations.
  • MediaWiki has galleries.

The first (container) type has the advantage of expressiveness, but most writers that support figures would need to decide how to deal with unsupported Figure use (mostly detecting and handling too-deep nesting).

The second (gallery) type has the advantage of not requiring the writers to come up with fallback strategies, certainly not as often. It's also a bit easier to deal with it (no detecting if a Figure represents a captioned Table or CodeBlock). But it is less expressive.

The second (gallery) version could be a grid, but I don't know how well that's supported in the outputs. Possibly support could be added with whatever native tables the output supported, if there wasn't native support for the grid version.

That is my understanding of HTML5 figures as well.

The figure element represents some flow content, optionally with a caption, that is self-contained (like a complete sentence) and is typically referenced as a single unit from the main flow of the document.

From the spec. Both it and MDN mention that it can be (or usually is) used for content that can be moved elsewhere without affecting the main flow of a document, and is best referenced in the text with a label like "Figure 7" so it can be moved. That's how floats are used in LaTeX, though LaTeX is much happier to move floats around by default than web browsers are figures, and will automatically put "Table 3" or "Figure 2" in the caption for you.

I also prefer the semantically cleaner container version of figures. Their recursive structure does mean that figures inside Pandoc may be nested more deeply than figures are allowed in the output, so it's important to know what extensions (LaTeX packages and the like) can be used in the outputs to deal with nested figures.

The HTML writer currently has to deal with the fact that HTML headings only go up to h6. Its fallback when encountering a Header past 6 is to render it as a paragraph with the heading class.

So, eventually, the writers that have depth-limited figures and tables could keep track of the current figure depth. If they encounter too-deep nesting, they could convert the Figure to a Div containing its body and a Div caption (with appropriate classes), then attempt to render that. Otherwise they would render figures (and tables and galleries) however they're supported in the output.

Initially, of course, every writer would need to fallback in this way, except for figures with [Table...] and [Plain [Image...]] content, which would be rendered as tables and figures currently are. Then better support (for figures, subfigures, subtables, and galleries) could be added to the relevant outputs.

See #6782 for important info on accessibility.

Noting that #5994 depends on this.

tarleb commented

This was done in pandoc 3.