saulpw/unzip-http

Extra 'P' char in file data after CRLF

bousqi opened this issue · 2 comments

First of all, thanks for this package. I was looking for such feature few month ago but found nothing. Finally it came.

I'm trying to read text files from large remote file to avoid gigas to be downloaded for few kilos expected.
The target URL is : 'http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2051.ZIP'

And I'm trying to get those two files :

ZIP_CONTENT_FILE = "DATA/CURR_VERS_NAVI.TXT"
ZIP_MAP_VERSION_FILE = "MAP.inf"

For both of them, reading line per line using readline(), I get an extra 'P' char after the '\r\n'

For example:

import unzip_http

rzf = unzip_http.RemoteZipFile('http://psa.download.navigation.com/automotive/PSA/RT6-SMEGx/M49RG20-Q0420-2051.ZIP')
fp_contents = rzf.open("DATA/CURR_VERS_NAVI.TXT")

print(fp_contents.data)
b'CID:013,118,0,4,2020,"11/11/2020","MIDDLE_EUROPE","HERE"\r\nP'

Giving a second though, it might be the 'PK' header of the next Zip entry that invited himself 😀

Good catch, @bousqi! That's exactly what was happening. It only shows up with stored files, because the extra character is absorbed by the decompressor otherwise. Thanks for filing the bug, should be fixed now. We'll release an updated version soon.