phpdave11/gofpdi

gofpdi fails to correctly parse streams on some pdfs

Opened this issue · 0 comments

When reading some PDFs (seen this typically when importing scanned-in PDFs), gofpdi will fail to detect 'endstream', panicking with panic: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: dstream.

When reading a PDF stream the reader should start reading stream after the first CRLF sequence but instead skips all leading whitespace which can result in reading past the 'endstream' token.

Here's a test PDF with described behaviour.
BRW2C6FC94B5488_000827.pdf