SebastiaanKlippert/go-wkhtmltopdf

The pdf bottom text is split/cut-off into 2 pages

jagdevsingh9709 opened this issue · 1 comments

wkhtmltopdf version(s) affected: 0.12.6

OS information
CentOS 7

Description
We have created a docker image with base image CentOS 7. We have developed our application in GoLang. This app will accept a website url which will act as an input to read html contents and it will return a pdf generated by wkhtmltopdf tool. The generated pdf has contents like normal text, Headers, table etc at the bottom of a page being stripped/cut-off and divided into 2 pages.

GoLang Version: 1.14.9
OS Version: CentOS 7
wkhtmltopdf: https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox-0.12.6-1.centos7.x86_64.rpm
wkhtmltopdf GoLang Library: https://github.com/SebastiaanKlippert/go-wkhtmltopdf/tree/v1.7.2
Dockerfile:

FROM centos:7
# Install fonts and wkhtmltopdf
RUN yum -y install xorg-x11-fonts-75dpi xorg-x11-fonts-Type1 dejavu-lgc-sans-fonts dejavu-lgc-sans-mono-fonts dejavu-lgc-serif-fonts mathjax-ams-fonts mathjax-caligraphic-fonts mathjax-fraktur-fonts mathjax-main-fonts mathjax-math-fonts mathjax-sansserif-fonts mathjax-script-fonts mathjax-size1-fonts mathjax-size2-fonts mathjax-size3-fonts mathjax-size4-fonts mathjax-typewriter-fonts mathjax-winchrome-fonts mathjax-winie6-fonts stix-fonts stix-math-fonts google-roboto-condensed-fonts google-roboto-fonts google-roboto-mono-fonts google-roboto-slab-fonts roboto-fontface-common roboto-fontface-fonts root-fonts
RUN yum -y install https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox-0.12.6-1.centos7.x86_64.rpm
WORKDIR /app
ADD assets assets
ADD configs configs
COPY --from=build-env /src/svc .
EXPOSE 8000
ENTRYPOINT ./svc

GoLang Code for wkhtmltopdf config:

        var pdfg *wkhtmltopdf.PDFGenerator
        //Margins
        pdfg.MarginLeft.Set(0)
        pdfg.MarginRight.Set(0)
        page := wkhtmltopdf.NewPage(src)
        // Allow local images access
        page.EnableLocalFileAccess.Set(true)
        // Use print media
	page.PrintMediaType.Set(true)
	// Disable smart shrinking.
	page.DisableSmartShrinking.Set(true)
	// Set view port size
	page.ViewportSize.Set("840x600")

	// Add headers and footers
	if siteConf.HeaderHTML != "" {
		page.HeaderHTML.Set(siteConf.HeaderHTML)
		page.HeaderSpacing.Set(3.0)
	}
	if siteConf.FooterHTML != "" {
		page.FooterHTML.Set(siteConf.FooterHTML)
		page.FooterSpacing.Set(3.0)
	}

	// Make sure page loads completely
	page.WindowStatus.Set("done")
	page.NoStopSlowScripts.Set(true)
	// This occasionally fails unkonwn reasons.
	page.RunScript.Set("MathJax.Hub.Queue(function(){window.status=\"done\";});")
	// This sets a hard timeout on pdf generation.
	page.RunScript.Set("setTimeout(function(){window.status=\"done\";},30000)")

	// Add to document
	pdfg.AddPage(page)

	// For debugging purposes
	// logFunc.Warn(pdfg.ArgString())

	// Create PDF document in internal buffer
	err = pdfg.Create()

How to reproduce
NA

Expected behavior
The contents of the pdf should be displayed properly. The text at bottom should not split/cut-off between 2 pages.

Possible Solution
NA

Issue1
Issue2

Hi, unfortunately this is not something that I can fix. I just provided the Go wrapper, this will also happen when calling wkhtmltopdf directly. Their repo is at https://github.com/wkhtmltopdf/wkhtmltopdf

This is always hard, but the reason is also not clear without seeing the actual HTML. In this case it looks like you are printing a single element like a table and the page break CSS elements are missing or incorrect. It can be solved with tags like page-break-inside: avoid but if this is just any random web page you are printing then you cannot control their styling. If it is your own HTML then you can fix it with proper CSS and print-media types.
It might also be a mismatch in page size (source using A4 and wkhtmltopd using letter for example).

But do note that just printing HTML headers and paragraphs should not show this behaviour as far as I know, but like I say, you would realy need to look at the source HTML to be able to say anything about the problem.