jgm/commonmark-hs

[fuzz result] [commonmark-pandoc] footnotes in footnotes

Closed this issue · 3 comments

This is loosely related to jgm/pandoc#2053, but commonmark_x is different from pandoc flavored markdown. I'm not really sure which repository to report this against, but since commonmark-pandoc lives here, this is where I'm reporting it.

When I run this session, I get a result that's definitely wrong. This result came from a custom build of pandoc, but pandoc.org/try shows the same thing.

$ pandoc -f commonmark_x -t html5
[^foo]:
    bar [^bar]

[^foo]

[^bar]: baz
^D
<p><a href="#fn1" class="footnote-ref" id="fnref1"
role="doc-noteref"><sup>1</sup></a></p>
<section id="footnotes" class="footnotes footnotes-end-of-document"
role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>bar <a href="#fn1" class="footnote-ref" id="fnref1"
role="doc-noteref"><sup>1</sup></a><a href="#fnref1"
class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>

However, commonmark-cli gives a correct result (in the sense that github and commonmark-cli are in alignment):

$ stack run -- --extension=footnotes
[^foo]:
    bar [^bar]

[^foo]

[^bar]: baz
^D
<p><sup class="footnote-ref"><a href="#fn-foo" id="fnref-foo">1</a></sup></p>
<section class="footnotes">
<div class="footnote" id="fn-foo">
<div class="footnote-number">
<a href="#fnref-foo">1</a>
</div>
<div class="footnote-contents">
<p>bar <sup class="footnote-ref"><a href="#fn-bar" id="fnref-bar">2</a></sup></p>
</div>
</div>
<div class="footnote" id="fn-bar">
<div class="footnote-number">
<a href="#fnref-bar">2</a>
</div>
<div class="footnote-contents">
<p>baz</p>
</div>
</div>
</section>

I would expect Pandoc to do one of these things:

  1. Throw an error message and refuse to output anything.
  2. Treat the nested footnote as plain text, just like pandoc -f markdown does.
  3. Work the same way commonmark-cli and github do. I understand that Pandoc has no interest in supporting this, since it's bad typography and unsupported by some of Pandoc's output formats, so the first two options would make more sense.
jgm commented

pandoc -f commonmark_x -t native gives:

[ Para
    [ Note
        [ Para [ Str "bar" , Space , Note [ Para [ Str "baz" ] ] ] ]
    ]
]

which seems right. It's pandoc's HTML writer that is unable to handle this gracefully. So, we could address this there, or we could modify commonmark-pandoc so that it detects nested notes and emits

[ Para
    [ Note
        [ Para [ Str "bar" , Space , Str "[^bar]" ] ]
    ]
]

Or both, of course. One reason to address it in pandoc's HTML writer is that other readers may allow the nested notes to be created.

First problem: you'd have to fix a lot of writers.

$ pandoc -f commonmark+footnotes -t commonmark+footnotes
[^1]

[^1]: bar [^2]
$ pandoc -f commonmark+footnotes -t latex
\footnote{bar \footnotemark{}}
$ pandoc -f commonmark+footnotes -t rst
 [1]_

.. [1]
   bar  [2]_

Second problem: Pandoc duplicates the footnote when I refer to it more than once. This isn't a problem in Pandoc-flavored Markdown, but with recursive footnotes, you probably want a recursion limit.

$ cat checker.py 
print("[^lol1]")
print("[^lol8]: lol")
for a in range(1, 8):
    print(f"[^lol{a}]:")
    for b in range(10):
        print(f"    [^lol{(a + 1)}]")
$ python checker.py > test-in.md && du -h test-in.md
12K	test-in.md
$ cgmemtime -- pandoc -f commonmark+footnotes < test-in.md > test-out.md

user: 167.129 s
sys:  003.155 s
wall: 170.746 s
child_RSS_high:    6619136 KiB
group_mem_high:    6540472 KiB
jgm commented

Pushed a fix that gives

[^foo]:
    bar [^bar]

[^foo]

[^bar]: baz
^D
[ Para [ Note [ Para [ Str "bar" , Space , Str "" ] ] ] ]

I think that's good enough.