golang/go

archive/zip: io.Reader like archive/tar

Closed this issue · 7 comments

I stream read zips in java and have a couple of Go projects that would benefit from this. Although zips have a footer that allows random access, files are saved sequentially, which allows for streaming as well.

minux commented

Impossible? Here's a java snippet (of the critical code) from an installer I wrote years ago, which reads large zip files, and displays images from near the start of the archive while the rest of it downloads.

URLConnection connection = new URL(updateZipURL).openConnection();
BufferedInputStream in = new BufferedInputStream(new MyStream(connection.getInputStream()));
zipStream = new ZipInputStream(in);
ZipEntry entry;
while ((entry = zipStream.getNextEntry()) != null) {
// If image, add to display queue, otherwise unpack to file.
}

API for ZipInputStream here: https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html

To answer your first question, I want to port the installer to go. I also have another project that would benefit from this library.

minux commented

Of course it won't work in the general case.

consider this:
zip a.zip malicious_stuff
zip b.zip good_stuff
cat a.zip b.zip > c.zip
zip -A c.zip

zipinfo c.zip or unzip c.zip will only show the good_stuff, however, a streaming
uncompressor will definitely see the malicious_stuff. It may or may not see
the good_stuff.

In general, zip file can have arbitrary prefixed data at the front, how could a
streaming uncompressor deal with that?

I like your example. It shows the care and thoughtfulness that goes into the standard library. The javadoc makes no mention of how false positives are caught (probably aren't) or should be handled.

To deal with arbitrary prefixed data, one could return errors from the Next function:
PreviousFilesInvalidError:
where a directory record ends, but there is still more data in the stream. This would indicate two (or more) zip files concatenated, and that all previous files were false and that more files may be coming. This error may also be used in the case where the stream is closed without a directory record, or could fire an InvalidFileFormatError or similar.

FirstNFilesInvalidError: could be used to catch cases where files were prepended without directory record. Technically this error could/should be used for the above case as well.

Alternatively/Additionally, one could have explanations/warnings of streaming zips in the docs.

Thanks.

Hi Minux,

I think, given the concerns I'll write my own third party library and copy across bits from archive/zip.

Thanks for your time.

minux commented

Finally got around to putting something together:
https://github.com/krolaw/zipstream