LaurentRDC/pandoc-pyplot

Limited to matplotlib

mitinarseny opened this issue · 18 comments

Hello!

Let me first thank you for such an amaizing filter!!! I was looking especially for this kind of functionality and run into this repo.

But the only disadvantage of the approach you are using is that with this filter I am limited to matplotlib and can't use any other, like Plotly. Matplotlib is great, but it lacks some features and it would be awesome to add support for other libraries, too.

Instead of adding support for each library individually, I suggest to implement more general functionality (I can't make a PR as I do not know Haskell yet) by providing exporter argument that should be a function that takes filename and does whatever stuff that should be done to save figure. For matplotlib it will look something like following:

```{.pyplot type="svg" exporter="lambda filename: plt.savefig(filename)')"}
plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('This is an example figure')
```

This approach allows you to use any library and any format that you want:

```{.pyplot type="html" exporter="lambda filename: py.offline.plot(hg_hist_fig, filename=filename"}
import plotly as py
import scipy as sp

hg = sp.stats.hypergeom(xi.N,xi.m,xi.n)
hg_hist_x = range(*map(int, sp.stats.hypergeom.interval(1, 80, 15, 20)))
hg_hist_fig = go.Figure(,
    data=(go.Scatter(,
        name='histogram',
        x=list(hg_hist_x),
        y=list(map(hg.pmf, hg_hist_x)),
        mode='markers',
    ),),
    layout=go.Layout(,
        title=go.layout.Title(,
            text=r'$$\\xi \\sim HG(80, 15, 20)$$',
            x=.5,
        ),
        yaxis=go.layout.YAxis(,
            title=go.layout.yaxis.Title(,
                text=r'$$\\mathbb{P}(\\xi=k)$$',
            ),
        ),
        xaxis=go.layout.XAxis(,
            title=go.layout.xaxis.Title(,
                text=r'$$k$$',
            ),
        ),
    ),
)
```

In this example we embed html in final document (it may me useful for converting your markdown to html and then to pdf).

P.S.: Take a look at this repo it may contain some great ideas as well.

Hello @mitinarseny ,

This is an interesting extension! I like your idea of an 'exporter' function.

Using this approach, we could generalize to Matplotlib and Plotly. Are there any other plotting libraries that you usually use through Python?

Thanks for your response!

As for your question:

Are there any other plotting libraries that you usually use through Python?

I used to Matplotlib, but then switched to Plotly as it can produce interactive plots while giving a lot of control over layout.

Allright, so I'll extend support to Plotly "soon".

I was bored, so Plotly support is now live in pandoc-pyplot 2.2.0.0.

To use the Plotly rendering pipeline, simple use the plotly class instead of the pyplot class. Markdown example:

```{.plotly caption="This is a Plotly figure!"}
import plotly.graph_objects as go
fig = go.Figure(
    data=[go.Bar(y=[2, 1, 3])],
    layout_title_text="A Figure Displaying Itself"
)
```

Fun fact: pandoc-pyplot will dig in the Python globals() to find the figure, so the figure name can be anything!

Only static exports are supported for now, i.e. fig.write_image(...).

Let me know what you think and we can tweak the Plotly support for future releases.

@LaurentRDC Thank you a lot for implementing this!

As for the interactive plots I have following idea:
As far as I know, it should be possible to get the output writer type during execution of the filter (this data should be passed along with metadata from pandoc). So, if type is html(5), then embed it with <embed> or <iframe> tag (see Plotly HTML export docs), otherwise - export as static image.

What do you think, is it possible to implement as well?

BTW, can you share how No Wasted Work feature is implemented? Just interesting. And what this feature is really about?

  • As I understand, it looks for changes in scripts and any change was found, then it executes it again to regenerate the plot. Filters have to be stateless, so the information about last script execution should be stored somehow. If so, how is it implemented?

  • Does the filter manages to regenerate the plot if at least one of include scripts has changed?

  • Is it possible to force the filter to regenerate plots? This will be useful if you use some non-invariant functions in your code (ex. random.randint(), or datetime.now()) that will change the result every time you invoke it.

Sorry, I should tried to use the filter first :)
Here I can see the text of the program
image
May be it would be better to use .py extension instead of .txt as users can download the source file without changing extension in the future?

But the following question is still unresolved for me:

Is it possible to force the filter to regenerate plots? This will be useful if you use some non-invariant functions in your code (ex. random.randint(), or datetime.now()) that will change the result every time you invoke it.

BTW, can you share how No Wasted Work feature is implemented? Just interesting. And what this feature is really about?

  • As I understand, it looks for changes in scripts and any change was found, then it executes it again to regenerate the plot. Filters have to be stateless, so the information about last script execution should be stored somehow. If so, how is it implemented?
  • Does the filter manages to regenerate the plot if at least one of include scripts has changed?
  • Is it possible to force the filter to regenerate plots? This will be useful if you use some non-invariant functions in your code (ex. random.randint(), or datetime.now()) that will change the result every time you invoke it.

The feature works by considering everything that goes into making the figure, and computing the hash of that. This is why figure files have weird filenames; they are the hash values! The following information is hashed:

-- | Datatype containing all parameters required to run pandoc-pyplot.
--
-- It is assumed that once a @FigureSpec@ has been created, no configuration
-- can overload it; hence, a @FigureSpec@ completely encodes a particular figure.
data FigureSpec = FigureSpec
    { caption      :: String           -- ^ Figure caption.
    , withLinks    :: Bool             -- ^ Append links to source code and high-dpi figure to caption.
    , script       :: PythonScript     -- ^ Source code for the figure.
    , saveFormat   :: SaveFormat       -- ^ Save format of the figure.
    , directory    :: FilePath         -- ^ Directory where to save the file.
    , dpi          :: Int              -- ^ Dots-per-inch of figure. This option only affects the Matplotlib backend.
    , renderingLib :: RenderingLibrary -- ^ Rendering library.
    , tightBbox    :: Bool             -- ^ Enforce tight bounding-box with @bbox_inches="tight"@. This option only affects the Matplotlib backend.
    , transparent  :: Bool             -- ^ Make figure background transparent. This option only affects the Matplotlib backend.
    , blockAttrs   :: Attr             -- ^ Attributes not related to @pandoc-pyplot@ will be propagated.
    } deriving Generic

instance Hashable FigureSpec -- From Generic

The hash takes into account everything, including changes in include scripts. The only way at this time to force recomputing of the figures is to delete the files, which are by default in the generated/ folder.

I personally would NOT want my figures to change based on, say, random number generation; this is why I seed my number random generator in an include script (example here)

Sorry, I should tried to use the filter first :)
Here I can see the text of the program
image
May be it would be better to use .py extension instead of .txt as users can download the source file without changing extension in the future?

The reason the source code is stored in "txt" files is because text files can be natively opened in the browser. Otherwise, if you clicked on the source code link, you would have to download the file, and open a text editor. See here for an example.

@LaurentRDC Thank you a lot for implementing this!

As for the interactive plots I have following idea:
As far as I know, it should be possible to get the output writer type during execution of the filter (this data should be passed along with metadata from pandoc). So, if type is html(5), then embed it with <embed> or <iframe> tag (see Plotly HTML export docs), otherwise - export as static image.

What do you think, is it possible to implement as well?

I will think about this. My main use-case for pandoc-pyplot is mostly PDF generation at this time. Embeding HTML in HTML is obviously fine, but how to handle HTML content in the case where a LaTeX document is compiled to PDF is not straightforward.

Embeding HTML in HTML is obviously fine, but how to handle HTML content in the case where a LaTeX document is compiled to PDF is not straightforward.

What disadvantages do you see in approach I described [here]?(#4 (comment)):

As far as I know, it should be possible to get the output writer type during execution of the filter (this data should be passed along with metadata from pandoc). So, if type is html(5), then embed it with <embed> or <iframe> tag (see Plotly HTML export docs), otherwise - export as static image.

BTW, is it possible to export plot as SVG with current implementation?

BTW, is it possible to export plot as SVG with current implementation?

Yes, the following export formats are available:

data SaveFormat
    = PNG
    | PDF
    | SVG
    | JPG
    | EPS
    | GIF
    | TIF
    deriving (Bounded, Enum, Eq, Show, Generic)

Embeding HTML in HTML is obviously fine, but how to handle HTML content in the case where a LaTeX document is compiled to PDF is not straightforward.

What disadvantages do you see in approach I described [here]?(#4 (comment)):

As far as I know, it should be possible to get the output writer type during execution of the filter (this data should be passed along with metadata from pandoc). So, if type is html(5), then embed it with <embed> or <iframe> tag (see Plotly HTML export docs), otherwise - export as static image.

Pandoc filters only work on the intermediate document representation. Filters don't know the source format, nor the output format. There may be games that can be played, but off the top of my head I'm not sure right now.

Pandoc filters only work on the intermediate document representation.

As I see here, getting the output format it possible (I am not sure, because I don't know Haskell)

Correct me if I'm wrong.

Wow that's great, I didn't know! Thanks.

I'll close this issue and open another one for interactive Plotly plots.

@LaurentRDC How about not mess with classes (like {.pyplot} and {.plotly}) and deduce the library that is going to be used from imports defined in code block (or script that was included with {.pyplot include=imports.py})?

Using different classes simplifies the Haskell implementation. I think this generalizes better for multiple backends.