golang/go

archive/zip: OpenReader(zipfile) simply says "zip: not a valid zip file" for jar files with extra bytes, which Java and unzip handle fine

davidmichaelkarr opened this issue · 9 comments

I wrote a small tool that uses "archive/zip" to look at some aspects of zip files. I've tested it with many zip and Java jar files, and it appears to work perfectly fine. I built it with Go v1.17 (not sure what patch version).

today I started working with a jar file from the "opentelemetry java" project, version 1.10.0. The jar file was provided to me by a member of a team in my organization. I don't know whether they got this directly from the opentelemetry Java github site or whether they assembled it themselves somehow. The opentelemetry Java site has that release (https://github.com/open-telemetry/opentelemetry-java/releases/tag/v1.10.0 ), but I don't see any binary artifacts.

When I run my tool against this jar file, it simply says

zip: not a valid zip file

with no other information. I stepped into the code, and that is the error returned from zip.OpenReader(zipfile).

I tried opening this jar file with "unzip", and it says the following:

warning [opentelemetry-javaagent.jar]:  3624 extra bytes at beginning or within zipfile
  (attempting to process anyway)

Except for this warning, it processes it perfectly fine. Java and the "jar" tool in the Java distribution don't even present a warning. It works fine in Java.

Note that you may have trouble repeating this if you just use the opentelemetry 1.10.0 jar file. I just tried downloading that jar file directly and running this test, and it doesn't have that problem. Somehow the jar file that ended up getting into my build process is slightly altered. We can deal with that, but I still think it's worth making archive/zip a little more robust, to let it deal with this somewhat nonstandard zip file.

opentelemetry-javaagent.zip

I'm attaching the jar file (renamed to zip so I can upload it). Again, I don't know how it got into this state, but except for archive/zip OpenReader failing on it, Java is perfectly fine with it, and unzip processes it with a warning.

raff commented

Given that the extra garbage at the beginning are HTTP headers I suspect you downloaded the file with something like curl -i -o output.zip <url> instead of curl -s -o output.zip <url>

I don't think anything should be done here, as noted above, the file has extra data (http headers) which isn't present in upstream source (https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v1.10.0), pointing to an error in your retrieval of the artifact.
Other tools like file also don't recognize it as a zip file.

thanm commented

@dsnet per owners.

As others have pointed out, the file in question has extra junk tacked onto the beginning, so it's not at all clear whether we would want to support this behavior. If we do allow extra junk, what are the rules? Is it ok to have extra junk just at the beginning, or can we have extra junk elsewhere as well? How many bytes of extra junk is OK and how much is too much? In your case you have about 3.6kbytes of extra junk-- what would happen if there were 5 MB of extra junk? If we allow extra junk in archive/zip files, what about ELF files or Go object files, should that be supported too?

I'm going to close out this issue; feel free to reopen if you disagree. Thanks.

Just so it's clear, both Java and Python can open and process this file without complaint.

I believe this is a dup of #10464

@ericchiang Thanks, I agree.

With https://go.dev/cl/387976 the prefixed zip file can be read. I verified that the Linux unzip program can read it, and the patch that I wrote should act the same way.

Change https://go.dev/cl/387976 mentions this issue: archive/zip: permit zip files to have prefixes