yob/pdf-reader

PDF::Reader::MalformedPDFError - after update to v2.10.0

kserhiyus opened this issue · 1 comments

Hello,
The gem is great and I'm quite a power user of it.

However after update v2.9.2 to v2.10.0 some of my PDFs fail to be processed.
I did check those failing PDFs in several online validators and there were no issues found.

The error i get comes from here: https://github.com/yob/pdf-reader/blob/main/lib/pdf/reader/cid_widths.rb#L55

Error full trace:

/opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:55:in `parse_second_form': CidWidths: 3 must be less than 3 (PDF::Reader::MalformedPDFError)
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:37:in `parse_array'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:22:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/width_calculator/composite.rb:17:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/width_calculator/composite.rb:17:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:146:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:146:in `build_width_calculator'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:49:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:214:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:214:in `block in extract_descendants'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:213:in `map'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:213:in `extract_descendants'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:48:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:393:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:393:in `block in build_fonts'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `each'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `map'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `build_fonts'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:30:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/item_receiver.rb:21:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/item_receiver.rb:21:in `page='
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/validating_receiver.rb:258:in `call_wrapped'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/validating_receiver.rb:24:in `page='
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:268:in `block in callback'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:267:in `each'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:267:in `callback'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:158:in `walk'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/processor.rb:37:in `block in extract_analyze_merge'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/processor.rb:34:in `collect'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/processor.rb:34:in `extract_analyze_merge'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/lib/pdf/processor.rb:27:in `block in <class:Processor>'
	from (eval):34:in `instance_exec'
	from (eval):34:in `__dry_initializer_initialize__'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/dry-initializer-3.0.4/lib/dry/initializer/mixin/root.rb:7:in `initialize'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/bin/pdfcb:81:in `new'
	from /opt/bitnami/ruby/lib/ruby/gems/2.6.0/gems/pdfcb-0.5.1/bin/pdfcb:81:in `<top (required)>'
	from /opt/bitnami/ruby/bin/pdfcb:25:in `load'
	from /opt/bitnami/ruby/bin/pdfcb:25:in `<main>'

In fact in order to keep up with an update i tweaked this line

raise MalformedPDFError, "CidWidths: #{first} must be less than #{final}" unless first < final

to be

raise MalformedPDFError, "CidWidths: #{first} must be less than #{final}" unless first <= final

and for me all works as before the update.

Here are the failing PDFs:
https://assets.publishing.service.gov.uk/media/5c640a8ded915d04148c31b0/Mr_J_Szymaniak_v_Jason_Hunt_and_Mardi_Hunt_trading_as_Crazy_Bear_Farm_and_Farm_Shop_-_3304471-2018.pdf
https://assets.publishing.service.gov.uk/media/5de917a2e5274a06d71f0413/Mr_J_Szymaniak_v_Jason_Hunt___Mardi_Hunt_TA_Crazy_Bear_Farm_and_Farm_Shop_-_3304471-2018_Judgment.pdf

Regards,
Serhii

yob commented

Thanks for the clear bug report.

That particular raise was added between v2.9.2 and v2.10.0, so this sounds like a bug and I suspect your fix is what we need. Are you up for opening a PR and I'll get it merged.