rendering a w3cdom document : infinite loop creation of TableCellBox
AlexisCothenet opened this issue · 4 comments
Hello,
I found an OOM but cannot understand the reason. It seems there is a cascade of TableCellBox created using this html (i tried to keep it small but i seems the number of td inside the first tr is mandatory and the 2 others tr as well...) :
String bodyhtml=
"<table style=\"border-collapse:separate;border:none;padding:0;margin:0;table-layout:fixed;width:711px\" width=\"711\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n" +
"<tbody>\n" +
"<tr style=\"height:1px\">"+
"<td style=\"border:none;padding:0\" width=\"91\"></td>"+
"<td style=\"border:none;padding:0\" width=\"45\"></td>"+
"<td style=\"border:none;padding:0\" width=\"1\"></td>"+
"<td style=\"border:none;padding:0\" width=\"1\"></td>"+
"<td style=\"border:none;padding:0\" width=\"75\"></td>"+
"<td style=\"border:none;padding:0\" width=\"20\"></td>"+
"<td style=\"border:none;padding:0\" width=\"52\"></td>"+
"<td style=\"border:none;padding:0\" width=\"17\"></td>"+
"<td style=\"border:none;padding:0\" width=\"55\"></td>"+
"<td style=\"border:none;padding:0\" width=\"17\"></td>"+
"<td style=\"border:none;padding:0\" width=\"74\"></td>"+
"<td style=\"border:none;padding:0\" width=\"15\"></td>"+
"<td style=\"border:none;padding:0\" width=\"2\"></td>"+
"<td style=\"border:none;padding:0\" width=\"74\"></td>"+
"<td style=\"border:none;padding:0\" width=\"17\"></td>"+
"<td style=\"border:none;padding:0\" width=\"21\"></td>"+
"<td style=\"border:none;padding:0\" width=\"87\"></td>"+
"</tr>" +
"<tr style=\"height:4px\">" +
"<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\" colspan=\"2\" rowspan=\"2\"> </td>" +
"</tr>" +
"<tr style=\"height:34px\">" +
"<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\"> </td>" +
"</tr>" +
"</tbody></table>";
Document doc = Jsoup.parse(htmContent);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "");
builder.toStream(outStream);
builder.run();
The version of htmltopdf used is 1.0.2 (jsoup 1.13.1).
Hi @AlexisCothenet,
This bug is very concerning as it involves text breaking. I was able, after much trial and error, to reduce your test case to the following (no Jsoup needed):
<table style="width: 3px;table-layout: fixed;">
<tr>
<td colspan="2"></td>
<td style="word-wrap: break-word;">ABC</td>
</tr>
</table>
Now I have narrowed it down to fixed table layout with colspan (or rowspan) and break-word, I'll try to find the root cause and fix it.
As always, thanks for reporting.
hi @danfickle , it continue to loop inside https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/InlineBoxing.java#L160 .
It continue to try to handle the "ABC" string.
lbContext.isFinished()
is never finised / lbContext.getStartSubstring().length()
is never 0.
Well, this is embarrassing...
It turns out that replicating is simple as:
<div style="width: 0; word-wrap: break-word;">ABC</div>
Ie. Any zero width box with content and break-word
will trigger it. This is a significant bug so I'll try to do a release soon with the fix. In the meantime, avoid break-word
or make sure you do not have any boxes with zero width (calculated or explicit) such as in tables.
And yes, I should have tested this edge case when implementing break-word
.
Thanks everyone.
Hello @danfickle ,
Is a release is planned soon for this problem ?
Thank you.