empira/PDFsharp

Many PDF docs from MSWord do not open (file does not appear corrupted)

Brandon2255p opened this issue · 4 comments

Reporting an Issue Here

Attached PDF was generated from Microsoft Word as a PDF, I ran this through online validators and they say it is valid and PDF 1.3

Example.pdf

When doing

using (var pdfDocument = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
                    {
                        CopyPages(pdfDocument, outPdf);
                    }

The file throws an exception

Expected Behavior

The file should open because it is not corrupted

Actual Behavior

"Invalid entry in XRef table, ID=8, Generation=0, Position=0, ID of referenced object=4, Generation of referenced object=0"

Steps to Reproduce the Behavior

using (var pdfDocument = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
                    {
                        CopyPages(pdfDocument, outPdf);
                    }

We are having the same problem. Are you generating PDFs in Word on a Mac OS? This appears to the culprit in our case.

The PDFs could very well be generated in Word on a Mac. I did not create it nor can I trace who created it. But we have experienced it a few times so far. Good observation thanks!

On a Mac you have two options to create the PDF, either for best printing or for best online usage. If you select the best for printing option, you will not be able to use it in PDFSharp. It is indeed unfortunate that word creates invalid PDF in that case but anyways, the library must ignore those issues otherwise it is not really useable