boazsegev/combine_pdf

get PDF original metadata

MathieuDerelle opened this issue · 1 comments

Is there a way to get the original metadata of a PDF we are opening with your gem

This line is overwritting directly Producer

@info[:Producer] = "Ruby CombinePDF #{CombinePDF::VERSION} Library"

Could you expose the original metadata as original_info or expose parser maybe ?

Hi @MathieuDerelle ,

You could always use the CombinePDF::Parser class manually before creating a new CombinePDF::PDF object, allowing you to extract the information before it's overwritten.

As a quick sketch (untested):

parser =  PDFParser.new(IO.read(file_name, mode: 'rb').force_encoding(Encoding::ASCII_8BIT))
info = parser.info_object.dup
pdf = PDF.new(parser)
puts info[:Producer] # => should contain original producer value

However, the producer should be updated if you save the PDF using CombinePDF. This makes it easier to track issues with PDF formatting.

Good luck!
Bo.