danfickle/openhtmltopdf

-fs-table-paginate tends to leave only thead and tfoot before page break

VictorAtPL opened this issue · 7 comments

I need to convert the html with multiple tables, which has thead, tbody, tfoot. The tables should be paginated if needed.

Here is the result I could achieve:
openhtmltopdf_bug

I've already tried #354 with no success.

Could you please guide me what should I try and/or where can I find the code responsible for paginated tables? I will try to do some dirt fix for myself.

Yes, nested paginated tables are definitely buggy. I think it is related to the ContentLimitContainer (in the render package) and its use, but I don't fully understand the table code.

Another issue, which you may be able to solve is we don't have a minimal example showing the problem. An absolutely minimal sample would help a lot.

Yes, nested paginated tables are definitely buggy. I think it is related to the ContentLimitContainer (in the render package) and its use, but I don't fully understand the table code.

Do you know how this library works in overall? There is lack of information of it. I am debugging the code for the second consecutive day and still not sure how it works.

It looks like: HTML -> BlockBox -> Layer -> Ops -> PdfBox renderer? I still can't find the place in code where table is split across to pages. For me it looks like the table is queued to render on both pages, but there are some overlays/constraints which prevents the insufficient part of table to be rendered on proper page.

If I understand it right, there is one TableBox for each Table in html and later if there is need to split the page on multiple pages then more than one layer is created. Am I right?

Another issue, which you may be able to solve is we don't have a minimal example showing the problem. An absolutely minimal sample would help a lot.

If I don't succeed to do it using my private code, I will try to prepare a MWE.

Here is MWE:
https://gist.github.com/VictorAtPL/82e035e5f4e306aa692cb206d6624347

If you think I should make it even more minimal, I will try.
background-color and border statements can be of course removed if it will make debugging easier.

Preview:
image

I made a dirty fix in #416, although still hiding the border should be added and some test cases of other users should be performed.

For my purpose - it works perfectly, but it rather hides the symptom than treat the cause.
From what I understood, it can be problem with ContentLimitContainer as @danfickle said. Maybe the ContentLimitContainer doesn't take into account the header and footer, or one of them? Not sure, but unfortunately do not have time for digging more in this issue.

@danfickle
Any help on how to run this test: 884194c
And how does it work? It only asserts that the execution has succeeded or somehow check if the header and footer are hidden and I am missing something?

Ok, after a couple of hours of messing around in the code, it dawned on me that the core problem is that when the first table row is moved to the next page, the actual table is not also being moved.

So looking in TableRowBox, about 98 we see that this is meant to happen already.

It turns out the problem is the next function, isShouldMoveToNextPage, is returning false because it is not taking into account the page-break-inside: avoid but later on the table row is being moved anyway as page break algorithm for block like elements (in BlockBoxing) does take into account CSS page break properties.

So, in essence, the problem is one of duplicate functionality.

A quality fix which I'll work on tomorrow would factor out the duplicate functionality into a method. In the meantime a simple fix could be adding the following code at L118:

else if (getStyle().isAvoidPageBreakInside()) {
            return true;
}

P.S The idea of the visual regression tests is to output a PDF, render it to an image with PDFBOX and then compare it to a render of a proof PDF.

Since for this test there is no proof yet, it simple outputs a PDF to your target directory under /openhtmltopdf-examples/target/test/visual-tests/test-output/issue-399-table-header-with-no-rows.pdf.

Nice,
good to hear you found some more quality solution than mine to fix the core of the problem, not the result.

Looking forward for your PR.