danfickle/openhtmltopdf

Regression: words missing using text-align: justify in pdfbox 2.0.21

lagar84 opened this issue · 3 comments

Hello,

I noticed that "open-dev-v1" upgraded PDFBOX to version 2.0.21

I was upgrading some libraries, then so give PDFBOX 2.0.21 a try too.

Unfortunately, I found a regression.

I created a reproducer to illustrate the issue:

minimal_reproducer_missing_words.txt

Using openhtml 1.0.4 with PDFBOX 2.0.20, works:
correct_output_v1 04_pdfbox_v2 0 20

But using openhtml 1.0.4 with PDFBOX 2.0.21, not all words appear on generated PDF:
wrong_output_v1 04_pdfbox_v2 0 21

I am not sure if this issue is on openhtml or PDFBOX side, so I am describing it here in the hope that someone with proper knowledge could wheighting in.

Thanks for this great library. Hope you can continue this great work.

Best wishes,
lagar84.

Huge thanks to @lagar84.

It turns out that PDFBOX 2.0.21 is reporting non-breaking space as having zero width. This means that our project thinks the missing words should fit on the first line. When the text is actually output the nbsp takes width and pushes the missing words into the page margin.

I have filed issue PDFBOX-4944: Built-in fonts are reporting nbsp char as having zero width. so that the PDFBOX team can address this issue. In the meantime, I have downgraded to PDFBOX 2.0.20 and added your test to our project.

Thanks again. You can leave this issue open until we come up with a new version of PDFBOX.

I have confirmed this is fixed with 2.0.22-SNAPSHOT.

Fixed with release of 1.0.6 and PDFBOX 2.0.22. Thanks again @lagar84.