aiidateam/archive-path

Lazy load of header index

Opened this issue · 0 comments

In read-mode, zipfile.ZipFile loads the entire index on initiation (to a list of ZipInfo), which is very unperfomant for archives with large amounts of files (for a million archived files, the index can be ~1 Gb in RAM).

For tarfile.TarFile the index is not read on initiation, but is whenever tarfile.TarFile.getmember is called (to a list of TarInfo). There is tarfile.TarFile.next() which reads the next index header and adds it to tarfile.TarFile.members.

Ideally with both the index would only be read up to when it is needed (e.g. when searching for a particular file to open)