LaurentRDC/pandoc-plot

Multiple figures in the same block

HaMF opened this issue · 4 comments

HaMF commented

At the moment, only one figure is saved and placed per code block. In this example using matplotlib for example, only the figure titled "3" will be in the final document:

# My document
```{.matplotlib}
import matplotlib.pyplot as plt

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('1')

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('2')

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('3')
```

Would it be possible to support such a scenario and generate/place all figures in the generated document with reasonable effort?

Of course this would need some careful thought on at least the following matters:

  • How to place the figures: Will each figure have a individual node (for example each figure has it's own figure environment in latex and use \ContinuedFloat)?
  • IDs usually need to be unique. If there is an ID given which element will get it assigned or will there be a "-1" suffix e.g.?
  • How are links to the generating code handled?

Background: I'm generating a document which mainly consists of figures and pandoc-plot seems like a perfect fit to automate this. Creating the figures for one subsection is very simple in a for loop.* I can not use subplots because a combined figure would not fit on one page. (Actually, I don't even want to worry about how many pictures fit on one page when writing the input document (Markdown).) I am therefore looking for a solution which places the individual figures and let's the output processor (for example Latex) handle them floating around.

I'm aware that this is a corner case. If you think it adds too much complexity or you don't want to support multiple figures in pandoc-plot I understand and I'll look for a different solution.

Cheers,
Hannes

`* An example would be plotting measurement vs. model for each measured temperature. In reality its more complicated and and the sections are already generated programatically. Also, it's debatable if creating such a PDF document is reasonable, but these are unfortunately the current boundary conditions.

pandoc: 2.9.2.1
pandoc-plot: v0.5.0.0

Hello,

That is a tricky situation. The reason pandoc-plot only generates one figure per code block is because there is no consistent way across programming languages to get all active figures.

Usually, plotting toolkits will have some function that returns the last figure (e.g. last_plot in GGPlot2, or gcf in MATLAB). Not all toolkits keep memory of all figures. It is possible to do this in Matplotlib like so:

import matplotlib.pyplot as plt

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('1')

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('2')

plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('3')


newfig = plt.figure()
FigureType = type(newfig)
del newfig # remove `newfig` from globals
all_figures = [obj for obj in globals().values() if type(obj) == FigureType]
for fig in all_figures:
    # save figure individually

This is only true because Python has a global dictionary, globals(). Because it would be a nightmare to generalize this across programming languages I do not know (e.g. R), I would rather try to find a different solution for your problem at this time.

For example, do you know about plt.subplots to create sub-figures? You can create large "figures" with subplots. You can arrange them in rows/columns. In this case, there would be one single caption (and one single source file) for the whole figure. Is this acceptable?

HaMF commented

Hi Laurent,

thanks for your quick response. With matplotlib one can get all figures with

figures = list(map(plt.figure, plt.get_fignums()))

Matlab works similarly if I remember correctly. (Edit: Not quite, it seems to be something along the lines of figHandles = findall(groot, 'Type', 'figure');) But you are of course right, implementing this for all supported plotting libraries is quite tedious.

As mentioned, subplots unfortunately does not work for me -- there are simply more plots that logically belong together than fit on one page. One would have to split them up after all. I can think of other solutions (like generating the plots separately and resorting them later with another pandoc filter) but they all feel quite dirty. Support in pandoc-plot would just be the cleanest, that's why I thought I'd just ask. Thanks again for your answer and consideration!

When you don't plan to support it I'd kindly suggest to add to the README that only the last figure gets saved and that support for multiple figures won't be included. (One can always revise the latter if things change of course.)

I missed your explanation why you can't use subplots, my apologies.

This is not a use case that I see being common, so I am cautious about implementing it given the extra complexity. I will add a notice to the README, as you suggest.

Yet, I think there are other alternative you could pursue to get it working. Is there a reason why you might not want to separate the plots into their own code blocks? For example, you could factor out the common code between all of them (e.g. import matplotlib.pyplot as plt) in some file include.py and use the following document structure:

# Section 1

```{.matplotlib preamble=include.py}
plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('1')
```

```{.matplotlib preamble=include.py}
plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('1')
```

```{.matplotlib preamble=include.py}
plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('1')
```

Each of the code block becomes a float (in LaTeX, for example) which will get placed by the layout engine. You can have as many such code blocks per section.

Creating the figures for one subsection is very simple in a for loop.

If you could modify the document generation to split each figure into its own Markdown code block, you would be done ;)

Let me know what you think

HaMF commented

Thank you very much for giving this that much thought. (It would have been easier if I shared an example from the beginning. I'm considering this for my day job and they tend to be picky about information security and I wanted to avoid troubles, sorry for that.)

There are two issues with generating one block per figure which stop me from doing it: 1) Overhead: Reading the data is slow (seconds), there are in total maybe 200 plots in the document. 2) "Editing conveniance": I would like to avoid an explicit complex preprocessing step and would like to invent as little syntax as possible. (Others will use it and it will become a maintenance burden). The second point is more flexible, I just want to avoid creating yet-another-legacy-system...

I really don't want to use more of your time than I already have but since you asked/if you're interested: My initial plan was to introduce a YAML block in the beginning to generate the sections similar to:

# Section 1
```{.matplotlib}
repeat:
  temperature: [-10, 0, 10]
---
read_data()
for n in range(0,10):
    plt.figure(f'{temperature}: parameter {n}')
    # these plots are not necessarily generated in a loop but are completely different graphs
```

which will be expanded to something equivalent to

# Section 1
## Temperature -10
```{.matplotlib}
temperature = -10 

read_data()
for n in range(0,10):
    plt.figure(f'{temperature}: parameter {n}')
    # these plots are not necessarily generated in a loop but are completely different graphs
```

## Temperature 0
```{.matplotlib}
temperature = 0

read_data()
for n in range(0,10):
    plt.figure(f'{temperature}: parameter {n}')
    # these plots are not necessarily generated in a loop but are completely different graphs
```
...

Basically I invented a syntax for a simple foreach loop here. (Maybe this is already stupid 🤔) Implementing it is, however, easy and straightforward. Nesting the generation of the blocks would start being a pain though.

I bet there is an mature framework for this out there somewhere but I haven't found anything really suitable. I'd be happy if you or anyone reading this has some pointers :)

Anyways thanks again for your consideration and your thoughts! Discussing the matter is already helpful. (I'll close this for now I think with the Readme change (thanks) it's resolved.)