yob/pdf-reader

crop text in 'Tj' PagesStrategy::OPERATORS

msk-yv opened this issue · 1 comments

What I see in pdf
image
Text what I see when call page.text

image

However, in page.raw_content I can see all date text
image

Can I be sure it just date format croping? Or it some system problem and when in that place would '22.12.2019' I`ll get '22.12.20' instead '22.12.19' ?

yob commented

This is likely to be the fault of the primitive algorithm in PageLayout. I'd love to find time to improve it!

The algorithm sometimes results in characters that will overlap, in which case some characters will be left out.