saulpw/unzip-http

ZIP64 format not supported ๐Ÿ˜…

Closed this issue ยท 6 comments

For reference, ZIP64 Overview

When trying to improve file read on very large files (meaning >4GB), the ZIP format has switched to ZIP64 format.
This is the case of the following : 'http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2001.ZIP'

This module present such an advantage to avoid getting more than 4GB of data for only spare Kilo Bytes.
However due to the way it works, the infolist() must support ZIP64 parsing with the new structures with signature PK\x06\x06 and PK\x06\x07

Sample failing sample code :
For example:

import unzip_http

rzf = unzip_http.RemoteZipFile('http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2001.ZIP')
fp_contents = rzf.open("DATA/CURR_VERS_NAVI.TXT")

Traceback (most recent call last):
....
  File "C:\Work\dev\psa-maps\venv\lib\site-packages\unzip_http.py", line 60, in infolist
    struct.unpack_from(self.fmt_cdirentry, resp.data, offset=filehdr_index)
struct.error: offset -1514118702 out of range for 65536-byte buffer

Your version fails because it reads classical EOCD record where some values are set to 0xFFFFFFFF in order to indicate that ZIP parser must consider the EOCD64 to find correct values.

Any chance to support this extended format ?

I forked and tried to support ZIP64, but I'm running out of time (not so spare time later these days). Eventually I will propose something, but you might go faster.

Hey @bousqi, thanks for the information and giving this a try! I think this format should definitely be supported and doesn't sound like it would be that difficult. Please do submit a PR if you wind up getting to it before I do.

Okay I got nerdsniped. Try this out, if it works for you we can cut a new release.

Damn ! back i'm the time I would have been more available to complete that quickly ๐Ÿ˜
Let me give a try and eventually open a new issue.... or not ๐Ÿค”

works fine so far. Let me give a shot on a large range of archives :)

@saulpw; 123GB over 128 zip files parsed in less than 3 minutes. So many bandwidth saved thanks to this module !