empira/PDFsharp

Corrupted File Error

Closed this issue · 4 comments

Reporting an Issue Here

Expected Behavior

After the PfdReader.Open is called on the pdf file it is expected to open for reading.

Actual Behavior

It throws an error in the program below.
Would be nice to have some type of repair functionality or ability to open this PDF as it opens in PDF viewer programs just fine.

Steps to Reproduce the Behavior

  1. Run attached solution and it will error out.
    Zip attached. PDF files also attached that it errors out on.
    PDFsharp.IssueSubmissionTemplate.zip
    TestUser%20%202%202023W2.pdf
    TestUser202023W2.pdf

Tried to open the PDF with Adobe Reader and got this:

Corrupt

If the files does not open with Adobe Reader, then I assume there is something wrong with the file.

Looks like some sort of archive file containing several files. Should open fine with PDFsharp if your code removes the extra headers and trailers before sending the contents to PDFsharp.

A valid PDF file starts with %PDF-x.y, e.g. %PDF-1.5. Your file starts with
PK��� ô0òXœb4kV?� V?� � FormW2_TestUser2_782477.pdf%PDF-1.5
(open it in Notepad++)
The file ends in line 1929 with %%EOF. In the next line a new PDF file begins
PK��� ô0òX”§ë¾@?� @?� � FormW2_estUser2_782478.pdf%PDF-1.5
Your file seems to be some kind of concatenation of 7 PDF files. I never saw this before. It is interesting that some browsers can open it but not Adobe Reader. I tried to extract the first two PDF file parts with Notepad++, but Adobe Reader still cannot open the single files.

What tool produces this file?

This is produced using Tax1099 through their GeneratePDF API. I will see if some pre-processing alleviates the issue.

Actually, both of the attached PDF files are ZIP files.