Many PDF docs from MSWord do not open (file does not appear corrupted)
Brandon2255p opened this issue · 4 comments
Reporting an Issue Here
Attached PDF was generated from Microsoft Word as a PDF, I ran this through online validators and they say it is valid and PDF 1.3
When doing
using (var pdfDocument = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
{
CopyPages(pdfDocument, outPdf);
}
The file throws an exception
Expected Behavior
The file should open because it is not corrupted
Actual Behavior
"Invalid entry in XRef table, ID=8, Generation=0, Position=0, ID of referenced object=4, Generation of referenced object=0"
Steps to Reproduce the Behavior
using (var pdfDocument = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
{
CopyPages(pdfDocument, outPdf);
}
We are having the same problem. Are you generating PDFs in Word on a Mac OS? This appears to the culprit in our case.
The PDFs could very well be generated in Word on a Mac. I did not create it nor can I trace who created it. But we have experienced it a few times so far. Good observation thanks!
On a Mac you have two options to create the PDF, either for best printing or for best online usage. If you select the best for printing option, you will not be able to use it in PDFSharp. It is indeed unfortunate that word creates invalid PDF in that case but anyways, the library must ignore those issues otherwise it is not really useable