simonhaenisch/md-to-pdf

feature: HTML Transformation

maciek-ibm opened this issue · 0 comments

Problem:

Sometimes there is a need to transform HTML just before generating the output in md-to-pdf, i.e. we need to convert images to Base64, sanitize links, add PagedJS, add some JavaScript.

Solution:

We achieved it by modifying a part of code in md-to-pdf like this:

let html = getHtml(md, config);
	if (config.transform_html) {
		html = await config.transform_html(html);
	}

Where transform_html is a custom function, i.e.:

const { JSDOM } = require('jsdom');
    const { window } = new JSDOM('<!DOCTYPE html>');
    if (!config.req) throw new TypeError(`config.req not defined!`);
    // NOTE: embeds images as base64
    const embedImages = new EmbedHTMLImages({ req: config.req });
    const sanitizedLinksHtml = replaceTextContentWithHref(
      html,
      window.DOMParser,
      true
    );

return await embedImages.run(sanitizedLinksHtml);