danfickle/openhtmltopdf

Execution of JavaScript?

krabbenpuler opened this issue ยท 5 comments

Hello,

My usecase for OpenHTMLtoPDF involves user-controlled XMLs that are sent to a server that interprete the XML as SVG and embeds the SVG into a PDF. So far, OpenHTMLtoPDF does the job well. However, I'm still a bit concerned since SVGs - in principle - support the execution of JavaScript, which might be an issue.

Since changing the infrastructure to avoid users having control over the XML is not an option (sadly), I'd like to know whether OpenHTMLtoPDF does allow for JavaScript execution or not?

Thanks!

@krabbenpuler According to https://xmlgraphics.apache.org/batik/faq.html#what-scripting-languages-batik- Apache Batik (the SVG engine used) supports ECMAScript by default. No idea if it is possible to configure Batik to disable any scripting.

I would suggest to check if the input XML string contains any <script . If it contains this tag, then there is likely something strange going on. I would just throw an exception/reject the user input in that case.

Thanks guys,

The good news on scripts was that I could only get them to run by adding a hint to the transcoder:

pdfTranscoder.addTranscodingHint(SVGAbstractTranscoder.KEY_EXECUTE_ONLOAD, true);

and also adding the optional dependency on Rhino script engine, although that may already be on many classpaths.

The bad news is that external resource requests were enabled. Think file://passwords.txt type resources. I have now committed code that further disables scripts and also disables external resource requests by default. If a user wants those features, they can use an alternate constructor for BatikSVGDrawer.

I'll try to do a release with this work tomorrow.

P.S. Whitelisting SVGs seems to be very hard. I found a good article on SVG whitelisting when I was researching the issue. Fortunately, we can let Batik take care of securing SVGs as it hooks into our code when it's about to load a script or external resource (see commit).

Thank you both for the detailed answers!

The point @danfickle made was exactly my fear, namely that it is possible to read files from the filesystem, e.g. /etc/passwd and the like. But if I understand you correctly, you had to add this specific hint to the transcoder in order to make scripts work.

By default, scripts don't seem to work, at least with the implementation I'm currently faced with (that by the way does not add this hint to the transcoder). I tried several syntax variations as well as DOM1 and DOM2 (see this and this), nothing worked so far (luckily). Anyways, disallowing the execution of scripts alltogether is still prefered.

Thank you for updating the code that quickly! I'm looking forward to update OpenHTMLtoPDF in the near future.

Cheers!

@danfickle FYI: You might just wait some days before releasing, as Apache PDFBox 2.0.16 is currently in the release vote process, and will be released soon.

Just released RC21 with scripts and external resource requests disabled by default. So closing now. Thanks again everyone.

If anyone wants the old behavior:

builder.useSVGDrawer(new BatikSVGDrawer(SvgScriptMode.INSECURE_ALLOW_SCRIPTS, SvgExternalResourceMode.INSECURE_ALLOW_EXTERNAL_RESOURCE_REQUESTS));