Regression: words missing using text-align: justify in pdfbox 2.0.21

Question

Regression: words missing using text-align: justify in pdfbox 2.0.21

lagar84 opened this issue 4 years ago · 3 comments

lagar84 commented 4 years ago

Hello,

I noticed that "open-dev-v1" upgraded PDFBOX to version 2.0.21

I was upgrading some libraries, then so give PDFBOX 2.0.21 a try too.

Unfortunately, I found a regression.

I created a reproducer to illustrate the issue:

minimal_reproducer_missing_words.txt

Using openhtml 1.0.4 with PDFBOX 2.0.20, works:

But using openhtml 1.0.4 with PDFBOX 2.0.21, not all words appear on generated PDF:

I am not sure if this issue is on openhtml or PDFBOX side, so I am describing it here in the hope that someone with proper knowledge could wheighting in.

Thanks for this great library. Hope you can continue this great work.

Best wishes,
lagar84.

Answer 1 · 2020-08-29T08:37:28.000Z

Huge thanks to @lagar84.

It turns out that PDFBOX 2.0.21 is reporting non-breaking space as having zero width. This means that our project thinks the missing words should fit on the first line. When the text is actually output the nbsp takes width and pushes the missing words into the page margin.

I have filed issue PDFBOX-4944: Built-in fonts are reporting nbsp char as having zero width. so that the PDFBOX team can address this issue. In the meantime, I have downgraded to PDFBOX 2.0.20 and added your test to our project.

Thanks again. You can leave this issue open until we come up with a new version of PDFBOX.

Answer 2 · 2020-11-02T04:08:52.000Z

I have confirmed this is fixed with 2.0.22-SNAPSHOT.

Answer 3 · 2021-02-18T12:47:55.000Z

Fixed with release of 1.0.6 and PDFBOX 2.0.22. Thanks again @lagar84.