Invalid xref stream for lazy: true
fulf opened this issue · 0 comments
Ruby: 2.5.1
Origami: 2.1.0
When trying to read some PDFs with lazy: true
, the parser raises an exception and stops. The same PDFs are read without a problem with lazy: false
and no errors are indicated.
Origami::PDF.read(pdf_content_stream, lazy: true, verbosity: Origami::Parser::VERBOSE_TRACE)
[info ] ...Reading header...
[error] Breaking on: "\xBF\xBD\xEF\xBF\xBD\x04|\r\xEF\xBF..." at offset 0x3445c
[error] Last exception: [Origami::InvalidObjectError] Object shall begin with '%d %d obj' statement
[debug] Skipping this indirect object.
[trace] Read Stream object, 33 0 R
Origami::Parser::ParsingError: Invalid xref stream
from /.rvm/gems/ruby-2.5.1/gems/origami-2.1.0/lib/origami/parsers/pdf/lazy.rb:159:in `parse_revision_from_xrefstm'
I've managed to trace the error to the fact that in the snippet below, parse_object
fails on its first attempt, logging the two [error]
s, and then successfully returns a Origami::Stream
object. Of course Origami::Stream
!=
Origami::XRefStream
so the exception is raised. But an interesting thing is that XrefStream < Stream
.
# lib/origami/parsers/pdf/lazy.rb:157
def parse_revision_from_xrefstm(revision)
xrefstm = parse_object
raise ParsingError, "Invalid xref stream" unless xrefstm.is_a?(XRefStream)
# ...
I don't know much about PDF files, so I don't know if this is working as intended, or not. In any case, what solutions would there be to properly reading the file? Any ones more proper than below?
begin
Origami::PDF.read(pdf_content_stream, lazy: true)
rescue Origami::Parser::ParsingError
Origami::PDF.read(pdf_content_stream, lazy: false)
end