gofpdi fails to correctly parse streams on some pdfs
Opened this issue · 0 comments
napalu commented
When reading some PDFs (seen this typically when importing scanned-in PDFs), gofpdi will fail to detect 'endstream', panicking with panic: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: dstream
.
When reading a PDF stream the reader should start reading stream after the first CRLF
sequence but instead skips all leading whitespace which can result in reading past the 'endstream' token.
Here's a test PDF with described behaviour.
BRW2C6FC94B5488_000827.pdf