squidfunk/mkdocs-material

Built-in `privacy` plugin does not download external assets of embedded .html file

Closed this issue · 4 comments

Context

As discussed in #7596

I use an inline frame (<iframe>) to embed .html files in my documentation.
The .html files are generated by folium and download .js files from various CDNs.
The built-in privacy plugin is used for self-hosting of these external assets.

Bug description

The built-in privacy plugin does not download the external assets of the embedded .html file.

I would expect it to:

  • download the external assets to the site directory
  • replace all references with links to the downloaded copies

Related links

Reproduction

9.5.39-privacy-with-embedded-html.zip

Steps to reproduce

  1. run mkdocs serve
  2. check the .cache/plugin/privacy directory - the external assets of the embedded .html file are not downloaded
  3. check the docs/maps/bounding_box.html file - the references are not replaced with links to the downloaded copies

Browser

No response

Before submitting

Thanks for reporting. While this might not be obvious, for MkDocs to consider your HTML file as something to process, it must be listed under extra_templates. If you add the following to your mkdocs.yml:

extra_templates:
  - maps/bounding_box.html

The privacy plugin will process it and download all assets:

INFO    -  Downloading external file: https://cdn.jsdelivr.net/npm/leaflet@1.9.3/dist/leaflet.js
INFO    -  Downloading external file: https://code.jquery.com/jquery-3.7.1.min.js
INFO    -  Downloading external file: https://cdn.jsdelivr.net/npm/bootstrap@5.2.2/dist/js/bootstrap.bundle.min.js
INFO    -  Downloading external file: https://cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.js
INFO    -  Downloading external file: https://cdn.jsdelivr.net/npm/leaflet@1.9.3/dist/leaflet.css
INFO    -  Downloading external file: https://cdn.jsdelivr.net/npm/bootstrap@5.2.2/dist/css/bootstrap.min.css
INFO    -  Downloading external file: https://netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap-glyphicons.css
INFO    -  Downloading external file: https://cdn.jsdelivr.net/npm/@fortawesome/fontawesome-free@6.2.0/css/all.min.css
INFO    -  Downloading external file: https://cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.css
INFO    -  Downloading external file: https://cdn.jsdelivr.net/gh/python-visualization/folium/folium/templates/leaflet.awesome.rotate.min.css

I think this can be considered a design flaw in MkDocs, as all HTML files that are located in the docs_dir should probably automatically considered for moving them through the plugin pipeline. However, I'm not the one to decide. Maybe you can run this by the maintainers of MkDocs, to automatically include all files. We could also add an exception in the privacy plugin and try to detect when there are *.html files in the docs_dir that are not explicitly listed under extra_templates, but honestly, we're fighting MkDocs on so many fronts – I don't want to create another battlefield.

Note that dynamically generated asset URLs in JavaScript (= map tiles) are not downloaded - it's not possible to know what to download and replace without executing the JavaScript. Marking as resolved via configuration.

My suggestion about the bug came from reading the source code:

# Find all external style sheet and script files that are provided as
# part of the build (= already known to MkDocs on startup)
for initiator in files.media_files():

The privacy plugin uses the files.media_files() to find all media files detected by MkDocs.
The on_page... events process each documentation Markdown file.
So it made sense to also use file.static_pages() as this is the "canonical" way to find HTML files.

def on_files(files, *, config):
        
    print("Markdown Files:")
    
    for file in files.documentation_pages():
        print(" ", file.src_uri)
        
    print("Static Pages:")
    
    for file in files.static_pages():
        print(" ", file.src_uri)
$ mkdocs serve
INFO    -  Building documentation...
INFO    -  Cleaning site directory
Markdown Files:
  index.md
Static Pages:
  maps/bounding_box.html

extra_templates seems to me as a way to make use of the Jinja2 features to access the context and variables etc.
This use cases seems to be different, as the HTML file is an embed with external content, so it doesn't have to be a template 🤔
Perhaps it would be also necessary to avoid processing the static_pages that are set in extra_templates to not process them 2 times 🤔

Yes, HTML is not considered media files but templates by MkDocs. Honestly I'd consider it such an edge case, that its not worth trying to fix what's not broken. Two lines of config and it works.

We might add something to the documentation, though. PR appreciated ☺️

Thanks to both of you for your help ☺️
I've opened a PR with some additions to the docs.