jgm/pandoc

Implement ODT support for implicit_figures

iandol opened this issue · 32 comments

Hi, the documentation suggests for implicit_figures "This feature is not yet implemented for RTF, OpenDocument, or ODT". Multimarkdown does support figure captions for block markup figures, with this MMD:

![1—A medieval illustration of the ventricular theory of sensory perception, in which sense information (apart from touch in this case) is transferred into the *sensus communis* of the first of the 3 supposed ventricles for initial processing. This version is contained in the *Margarita Philosophica* {Reisch, 1504, #69055}](MargaritaPhilosophica.jpg)

generating the following FODT:

<text:p>
<draw:frame text:anchor-type="as-char" draw:z-index="0" draw:style-name="fr1" svg:width="95%">
<draw:text-box>
<text:p>
<draw:frame text:anchor-type="as-char" draw:z-index="1" >
<draw:image xlink:href="MargaritaPhilosophica.jpg" xlink:type="simple" xlink:show="embed" link:actuate="onLoad" draw:filter-name="&lt;All formats&gt;"/>
</draw:frame>
</text:p>
<text:p>Figure <text:sequence text:name="Figure" text:formula="ooow:Figure+1" style:num-format="1"> Update Fields to calculate numbers</text:sequence>: 1—A medieval illustration of the ventricular theory of sensory perception, in which sense information (apart from touch in this case) is transferred into the <text:span text:style-name="MMD-Italic">sensus communis</text:span> of the first of the 3 supposed ventricles for initial processing. This version is contained in the <text:span text:style-name="MMD-Italic">Margarita Philosophica</text:span> {Reisch, 1504, #69055}
</text:p>
</draw:text-box>
</draw:frame>
</text:p>

Will implicit_figures for ODT be supported in pandoc at some point in the future, and is there a timeline more or less? Thanks for an excellent tool!

jgm commented

No timeline. If someone wants to write the code (and adjust tests accordingly) and submit a PR, I'd certainly consider merging it.

Thanks for the info!

jgm commented

If you want to reopen this, the code sample you pasted in
might be helpful if someone wants to add this feature.

+++ Ian [Sep 17 15 00:10 ]:

Thanks for the info!


Reply to this email directly or [1]view it on GitHub.

References

  1. #2401 (comment)

OK, wish I could help but I really wouldn't know where to start, plus Haskell looks like Klingon to my poor Biologist-procedural-programmer eyes. I had a look at the Text.Pandoc.Writers.ODT.hs and see there is a blockToOpenDocument function but how to wrangle that to do the conversion...

Thanks John!

jgm commented

This will probably look more straightforward (and this is
what needs changing):

https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/OpenDocument.hs

See line 347, definition of 'figure'.

+++ Ian [Sep 17 15 12:06 ]:

OK, wish I could help but I really wouldn't know where to start, plus
Haskell looks like Klingon to my poor Biologist-procedural-programmer
eyes. I had a look at the Text.Pandoc.Writers.ODT.hs and see there is a
blockToOpenDocument function but how to wrangle that to do the
conversion...


Reply to this email directly or [1]view it on GitHub.

References

  1. #2401 (comment)

Also this is the code in MMD that handles images with optional captions:

https://github.com/fletcher/MultiMarkdown-4/blob/master/odf.c#L463

Uh... sorry, but what exactly is this about? #376, #2070?

I guess docs should've been updated, but other than that, I believe Pandoc supports implicit_figures with OpenDocument/ODT output.

@lierdakil — hm, going from MMD -> ODT with the example above and no image captions are generated. Perhaps this fails for the MMD input case, though the syntax is the same I think. If I use:

![This is the caption](Beast_mmd/eyes.png)  

I get no caption even as a subsequent paragraph:

<text:p text:style-name="First_20_paragraph"><draw:frame draw:name="img1" svg:width="397pt" svg:height="400pt"><draw:image xlink:href="Pictures/0.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" /></draw:frame></text:p>

using the following commandline: pandoc test.md --from markdown_mmd -o testPD.odt — pandoc is pandoc 1.15.0.6 on OS X 10.11.1

Right, if I omit the MMD --from then I get the caption, so for some reason this is being ignored for MMD, though MMD itself supports it...

Try explicitly enabling:
'-f markdown_mmd+implicit_figures'
1 окт. 2015 г. 2:20 пользователь "Ian" notifications@github.com написал:

Right, if I omit the MMD --from then I get the caption, so for some
reason this is being ignored for MMD, though MMD itself supports it...


Reply to this email directly or view it on GitHub
#2401 (comment).

Yes, that works @lierdakil thank you. So the question is if there is a reason the extension is not enabled by default for MMD (as both support the same syntax for the same feature), if there is then this can be closed.

MMD also wraps the figure caption in a frame which is slightly cleaner structurally, and appends figure number sequence (auto numbering which a reference-able). The XML seems pretty straight-forward for this. No great issue and I could probably even hack this myself, but wonder if there are reasons against using a frame (1) and adding numbering (2). The argument against (2) is that it wouldn't apply across output formats (does HTML even support auto-numbering etc.). But wrapping the caption in a frame is what LibreOffice does by default I think.

Automatic numbering is not something we can replicate in other output
formats, at least not at the moment. There is pandoc-crossref, but it will
insert figure numbers as plaintext in odt output.

As for frame, I briefly considered implementing it, but XML happened to be
much less straightforward then I felt was worth it, esp. considering
different rendering implementations between oo, lo and msword.
1 окт. 2015 г. 10:59 пользователь "Ian" notifications@github.com написал:

Yes, that works @lierdakil https://github.com/lierdakil thank you. So
the question is if there is a reason the extension is not enabled by
default for MMD (as both support the same syntax for the same feature), if
there is then this can be closed.

MMD also wraps the figure caption in a frame which is slightly cleaner
structurally, and appends figure number sequence (auto numbering which a
reference-able). The XML seems pretty straight-forward for this. No great
issue and I could probably even hack this myself, but wonder if there are
reasons against using a frame (1) and adding numbering (2). The argument
against (2) is that it wouldn't apply across output formats (does HTML even
support auto-numbering etc.). But wrapping the caption in a frame is what
LibreOffice does by default I think.


Reply to this email directly or view it on GitHub
#2401 (comment).

So, @jgm, do you suppose we could add Ext_implicit_figures to multimarkdownExtensions? I don't think this is new, so I'm not exactly sure why isn't it included.

Thanks @lierdakil, using a frame is no big issue. And I understand the issue with auto-numbering, though pandoc does have other features only some formats support but not others. But that is for another issue.

I also notice subscript and superscript extensions have to be explicitly enabled, and again these are things MMD supports too...

jgm commented

+++ Ian [Oct 01 15 16:30 ]:

I also notice subscript and superscript extensions have to be
explicitly enabled, and again these are things MMD supports too...

It may be that some features were added to MMD since I added the
markdown_mmd option to pandoc. It will be an easy change
to add these.

Here is the documentation FYI for subscript and superscript support in MMD:

http://fletcher.github.io/MultiMarkdown-4/MMD_Users_Guide.html#superscriptsandsubscripts

jgm commented

Subscripts and superscripts work differently in MMD.
You can do e^2 and a~1, where in pandoc you need to do e^2^ or a~1~.
Still, since the pandoc-style ones WILL work in MMD, enabling these options seems fine.

jgm commented

@iandol I think the numbering is a bit problematic, without some mechanism for localization -- we don't want to bake in the word "Figure" as the XML above does. But putting the whole thing in a frame seems worth doing and shouldn't be too complex. @lierdakil what difficulties did you encounter? I don't think we need to worry about other formats. It would be good to do this in Word too, but I see no reason not to do it in ODT even without doing it in Word.

@jgm, it was a while ago, so details are somewhat fuzzy. What I can remember right off the bat is that frame dimensions were messed up between OO and LO due to different rendering strategy, and only way I was able to make it work in both was setting frame dimensions in pixels. I'm no ODF expert though, so I might have missed an obvious solution.

P.S. And when I was talking about Word, I meant it's ODT renderer, not docx.

Even with images as they currently are (without the nested frames) I often have to do manual resizing in odt (Libreoffice), so I don't think that should be a show stopper. @lierdakil - perhaps whatever code you wrote before is worth trying again in the latest releases of OO.org and LO, if you still have it.

As for hardcoding "Figure", is there anything wrong with a writer-specific option?

@hubertp-lshift, I believe #3165 is for ODT reader. This issue about ODT writer output, so no, probably not.

jgm commented

This is a confusing thread. If I'm not mistaken, the only outstanding issue here is whether ODT figures can be put into a frame?

@jgm I think it all boils down to nobody having all three of time, skill and interest to do it. ODT figures definitely can be put in frames, as I've used perl to post-process pandoc output to that effect in the past.

@jgm: yes that is the only outstanding issue, which probably still applies also applies to DOCX as well as ODT unless something has changed (didn't see anything in the changelog).

jgm commented

Figure numbers have been dealt with now in commit ecd4d5b.

We should work on putting figures and captions in proper frames rather than paragraphs. @pyssling any interest?

I'm looking at it. It should be doable.

I assume the point is to limit the width of the caption to the width of the figure which is useful if you want to place it on the side of a page with text wrapping around it or similar?

@jgm I've had a good look now. This isn't strictly speaking something I need right now, maybe later. This would be rather invasive. Basically we need to split dimension setting into two parts in the case where there is an outer and inner frame (the outer contains caption and figure, the inner one contains only the figure.)

This is made complicated by the fact that we post-process dimensions in transformPicMath function in ODT.hs . This makes things awkward to say the least.

Do you know why it's done this way? Maybe because this is where we actually push the image into the file and can therefore get the real dimensions?

jgm commented

This would definitely make it easier. Also easier for anyone reading the code to figure out what's going on. I'll look at the other writers and see if I can figure out how this would work.

This is old and I think not relevant anymore, time to close...