lierdakil/pandoc-crossref

Table references undefined after list items

Opened this issue · 3 comments

I have come across a bug where when a table in the simple_tables style is preceded by a list item, references to that table are undefined. For example, if I put the following in test.md and then run pandoc --filter pandoc-crossref test.md, I get an Undefined cross-reference: tbl:tbl1 error.

A reference to [@tbl:tbl1].

1. list item

            Column label
----------- --------------
Row label   1

: A table. {#tbl:tbl1}

On the other hand, both of the following run without issue:

  1. using the pipe_tables format:
A reference to [@tbl:tbl1].

1. list item

|           |Column label |
|-----------|--------------|
|Row label  | 1 |

: A table. {#tbl:tbl1}

  1. deleting the list item:
A reference to [@tbl:tbl1].

            Column label
----------- --------------
Row label   1

: A table. {#tbl:tbl1}

I am using pandoc 3.3 and pandoc-crossref v0.3.17.1 (both installed with homebrew on MacOS).

So the problem here is, in your first example, the table caption isn't parsed as a table caption, it's parsed as a plain paragraph:

<!-- `pandoc -t html` output -->
<p>A reference to <span class="citation"
data-cites="tbl:tbl1">[@tbl:tbl1]</span>.</p>
<ol type="1">
<li><p>list item</p>
<table>
<thead>
<tr>
<th style="text-align: right;">Col</th>
<th style="text-align: left;">umn label</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">Row label</td>
<td style="text-align: left;">1</td>
</tr>
</tbody>
</table></li>
</ol>
<p>: A table. {#tbl:tbl1}</p> <!-- here is the issue -->

The quirk is, the table is parsed as part of the list item here, as you see, and the list item ends before the caption. And it's parsed as part of the list item because there are (at least) 4 spaces at the start of the first non-empty line after the list item.

You can move the caption to above the table, then the list item will be terminated before the table begins:

A reference to [@tbl:tbl1].

1. list item

: A table. {#tbl:tbl1}

            Column label
----------- --------------
Row label   1

If putting the table inside the list item was the intention (doesn't seem like it, but for the sake of argument), then adding 4 spaces before the caption should also work:

A reference to [@tbl:tbl1].

1. list item

            Column label
----------- --------------
Row label   1

    : A table. {#tbl:tbl1}

OK, nice, thanks for explaining that to me. I'm not sure if there's any way to catch this scenario and print a warning, but if so, that would be helpful for users like me. It took me quite a bit of trial and error even just to get down to a minimal example to reproduce the error.

I'm not sure either, but it likely would be easier to catch on the pandoc side.

For one, this is arguably pandoc's surprising behaviour (it seems to make sense when you think about it, but it's a weird, non-obvious parsing quirk).

For two, pandoc-crossref is getting AST from pandoc, the only real option then is detecting paragraphs that look like they might've been intended to be table captions, and that's... well, not particularly robust, technically any paragraph that starts with table:, Table: or : might be it, and a table caption failing to parse is not the only way of getting these.

So I suggest raising an issue upstream on https://github.com/jgm/pandoc and seeing what they say first.