danfickle/openhtmltopdf

A11y rendering NPE on valid markup

rhinowalrus opened this issue · 1 comments

Hello,

I’ve run into repeat null pointer exceptions when spinning out reports once accessibility (and fast mode) are turned on, with seemingly innocuous html strings. For instance:

<div class="go-report-intro">
  <p>SOMETEXT</p>
</div>

results in a NPE, while

<div class="go-report-intro">
  <p class="pull-left">SOMETEXT</p>
</div>

does not (.pull-left just having a css property of 'float:left', but the markup is in the correct order to be read properly). Our html output that gets processed appears valid, and works perfectly with A11y turned off.

From what we can tell it appears that some of the hierarchy that gets pieced together in the A11y toolset are becoming orphaned.

Here’s a snippet of the stack trace:

ERROR [calculationReportExecutor-1] HTMLtoPDFConvertor.createPDFReport(86) | Exception rendering pdf
java.lang.NullPointerException
	at org.apache.pdfbox.cos.COSArray.add(COSArray.java:62)
	at com.openhtmltopdf.pdfboxout.PdfBoxAccessibilityHelper.finishNumberTree(PdfBoxAccessibilityHelper.java:851)
	at com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.finish(PdfBoxFastOutputDevice.java:910)
	at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.writePDFFast(PdfBoxRenderer.java:661)
	at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.createPdfFast(PdfBoxRenderer.java:550)
	at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.createPDF(PdfBoxRenderer.java:468)
	at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.createPDF(PdfBoxRenderer.java:405)
	at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.createPDF(PdfBoxRenderer.java:387)
	at com.gossamer.voyant.report.HTMLtoPDFConvertor.createPDFReport(HTMLtoPDFConvertor.java:83)

I've gone through the A11y checklist in the wiki. I did notice that skipped heading levels cause the same exception, and I've got those corrected.

In a fork we have some logging to assist in determining where the failures are occurring in the markup. We’ve tried checking for orphaned boxes and adding them back to the root when it detects the above exception, but this obviously affects the order in which items are read. Is this something that others have handled before?

For context, we have a robust report set with hundreds of templates that are put together dynamically based on customer selections. Things like order, branding, languages and the data used in our images generally are different in each report generation, so to this point narrowing down the pieces of markup that are not working correctly has been a challenge.

Thank you,

Ryan

syjer commented

looks like the same issue as #401 .

Can you provide a full (minimal) html that can reproduce the error? Maybe we are able to make the PdfBoxAccessibilityHelper more robust :)