betatim/notebook-as-pdf

Embedding local images via markdown doesn't work

rlleshi opened this issue ยท 11 comments

Hi, very helpful extension. Thanks for your work!

There is just one problem that I have noticed.
If I embed an image via HTML (Image(filename = "my_file.png")) then it will perfectly work but if I embed it as a markdown (![title](my_file.png)), then it will not be converted in the pdf file.

Do you know what the difference is to the poppy field image that is in the example notebook https://github.com/betatim/notebook-as-pdf/blob/master/example.ipynb? That one works for me, does it also work for you?

Just gave it a try, it seems like referencing local files as images is the thing that doesn't work.

Looks like there are (at least) two issues:

  1. we write the HTML to a temporary directory. This means a reference to parrots.jpg won't point to the right place any more as that path is relative to the original notebook location
  2. headless chrome prevents access to local files for security. We can fix this by explicitly allowing it.

A solution to (1) might be to write the HTML to the directory containing the notebook but using a unique name so as to not interfere with existing files. Then all references to files would work.

Yeah, you are right, the problem seems to be referencing local files. Thanks for the suggestions

Is a possible fix for this the --to html_embed handler provided by nbextensions/export_embedded

Nice find. We could add all images (and other things?) to the ipynb itself as base64 "data URIs". This means you'd get all the original images and such also in the notebook file which is attached to the PDF. This could be super useful when you later retrieve the file and have lost the context directory. A downside is that it can make the file very large.

For reference the code used by the export_embedded extension is https://github.com/ipython-contrib/jupyter_contrib_nbextensions/blob/b767c69a374f68d2a7272e4fe9e0a40a47cdb8f0/src/jupyter_contrib_nbextensions/nbconvert_support/embedhtml.py

"A downside is that it can make the file very large."

Could the extension have an nbextensions setting or toolbar toggle button that lets you tick a checkbox to select whether the notebook is attached to the exported PDF or not?

Would a RegEx search and replace on the notebook before it is passed to the HTML exporter be a valid option?

I have experimented with it and its seems to work fine (on windows at least). See examplecode below. It could also be expanded to work on both the Markdown syntax (![alt](file)) and with <img> HTML tags.

I don't know if the performancepanalty would be to big or if I am missing another drawback. Whats your opinion?

code:

import re
import os

# create demo notebook:
notebook = {}
cell = {
    'cell_type': 'markdown',
    'source': r"""
        This image is a image inserted via Markdown image tag:

        ![A poppy field](https://unsplash.com/photos/sWlxCweDzzs/download?force=true&w=640 "A poppy field")

        ![a local file](my file.png)
        ![a file with Caption](file.png "test")
        ![Windows absolout](C:\file.png "test")
        ![Linux root](/file.png "test")
        ![Linux home](~/file.png "test")
    """
}
notebook['cells'] = [cell]

# real code:
RE_local_Images = re.compile(r"!\[(.*)\]\((?!https?://|[A-Z]:\\|/|~/)(.*?)( (\"|').*(\"|'))?\)")

for cell in notebook['cells']:
    if not cell['cell_type'] == 'markdown':
        continue

    offset = 0
    for match in RE_local_Images.finditer(cell['source']):
        path = match.group(2)
        fullpath = (os.path.realpath(os.path.join(resources['metadata']['path'], path))).replace(' ', '%20')
        cell['source'] = cell['source'][:match.start(2)+offset] + fullpath + cell['source'][match.end(2)+offset:]
        offset += len(fullpath)-(match.end(2)-match.start(2))

print(cell['source'])

The RegEx ignores all non-local types of path (I could think of) and replaces the files with full a path. Spaces in the Path are URL-escaped, because Markdown doesn't like that. Subfolders and ../ are also posible.

I tested inserting the code into the Project here:

def from_notebook_node(self, notebook, resources=None, **kwargs):
html_exporter = HTMLExporter(config=self.config, parent=self)

and it seems to work great.

Im am tagging you @betatim, because I am not sure if you got a notification about the above comment.

Is there any progress on this? I find this a very useful package for reviewing / editing / commenting on notebooks on my ipad, but broken images is a bit of a deal breaker.

This is really easy to fix with passing embed_images=True, nevertheless I created a pull request: #44.