galkahana/PDF-Writer

PDFParser::ParseFileDirectory,Unexpected object at xref start

huangtiansama opened this issue · 9 comments

I need to parse a PDF file, but it fails to parse
image
image
image

Hi huangtiansama, care to share the pdf so i can figure out why the xref start os read incorrectly?

Hi huangtiansama, care to share the pdf so i can figure out why the xref start os read incorrectly?

OK, but you need to wait Monday

Sure, no problem. Thank you :)

Sure, no problem. Thank you :)

image
sorry, This file is too large
I've tried some solutions that jump 8 bytes after "endobj" and so far, it works fine
image

image

Well could also contact whoevers created the pdf and tell them they create faulty ones.
Is this loading on acrobat?

Wanna try sending to me via email? Gal.kahana@hotmail.com

Wanna try sending to me via email? Gal.kahana@hotmail.com

OK, the email has been sent

ok. looks to be like a consistent 8 byte offset here.
normally i'd tell you to go back to whoever created that PDF and get them to fix the position...which you probably should anyways if you can.
Still. seems like i can create a fairly safe solution so that it's both not horribly unsafe and seems to read the file ok.
implemented here - #277.
if works for you, i can incorporate in the main branch

the solution didn't work. parts of the xref do point to the right positions. this PDF need to be corrected (adobe acrobat can be used for that, by opening and closing). then it can be used with this library