Lazy load of header index
Opened this issue · 0 comments
chrisjsewell commented
In read-mode, zipfile.ZipFile
loads the entire index on initiation (to a list of ZipInfo
), which is very unperfomant for archives with large amounts of files (for a million archived files, the index can be ~1 Gb in RAM).
For tarfile.TarFile
the index is not read on initiation, but is whenever tarfile.TarFile.getmember
is called (to a list of TarInfo
). There is tarfile.TarFile.next()
which reads the next index header and adds it to tarfile.TarFile.members
.
Ideally with both the index would only be read up to when it is needed (e.g. when searching for a particular file to open)