Extract Deflate64 ZIP archives with Python's zipfile
API.
pip install zipfile-deflate64
Python 3.6, 3.7, 3.8, 3.9, and 3.10 are supported, with manylinux2014, macOS and Windows wheels published to PyPI.
Anywhere in a Python codebase:
import zipfile_deflate64 # This has the side effect of patching the zipfile module to support Deflate64
Alternatively, zipfile_deflate64
re-exports the zipfile
API, as a convenience:
import zipfile_deflate64 as zipfile
zipfile.ZipFile(...)
...
Recent versions of Microsoft Windows Explorer use Deflate64 compression when creating ZIP files larger than 2GB. With the ubiquity of Windows and the ease of using "Sent to compressed folder", a majority of newly-created large ZIP files use Deflate64 compression.
However, support for Deflate64 in the open-source ecosystem is very poor! Most ZIP libraries have declined to implement Deflate64, citing its proprietary nature.
In the .NET ecosystem, the ZipArchive
API supports decompression only.
In Java, the Apache Commons Compress APIs support both compression and decompression.
The 7-Zip project probably provides the best general-purpose support for compressing and decompressing Deflate64, but there are several obstacles to general usability:
- 7-Zip itself is a Windows-only GUI application
- 7-Zip is still issuing new releases, but has declined to implement certain new compression formats, so the mcmilk/7-Zip-zstd fork is notable.
- p7zip, the POSIX-compatible CLI version (which does include Deflate64), has not had a release since 2016 and is likely unmaintained.
- p7zip does not build an API for external software to invoke for decompression.
- p7zip seems to now be living on as the jinfeihan57/p7zip fork,
which is packaged by Arch Linux, amongst others.
- This seems to be active, and now can be built with CMake, but there's no support for building an external API.
- Many re-implementations of 7-Zip, such as py7zr for Python, do not support Deflate64.
In the Python ecosystem in particular, there have been several unfulfilled requests ( [1] [2] [3] ) for Deflate64 decompression support.
The best hope seems to be the infback9 extension to zlib. This was developed in 2003 by Mark Adler, an original author of zlib, and is kept in the source repository of zlib, but it is not officially supported and contains no build tooling and is not distributed with zlib packages. Additionally, infback9 provides only low-level support for working with Deflate64 bitstreams, with no support for the ZIP archive format (which is out of scope for zlib).
infback9's C-language API is relatively simple, but requires a non-trivial struct and function pointers for
initialization and some explicit memory management operations (resizing allocated buffers and proving a
Python-friendly malloc
) to operate efficiently, so wrapping it with only
ctypes seems to be inadequate.
To manage ZIP archive extraction operations, the Python standard library zipfile module provides the essential features and is already ubiquitous in availability and usage. However, zipfile is difficult to extend, as it hardcodes many conditionals for compression formats and does not provide capabilities for easily augmenting or replacing parts of it. Monkey-patching can overcome some of these problems, and the promise of a drop-in, API-compatible patch to a standard library module outweighed the engineering benefits of basing a solution off a more naturally extensible third-party ZIP manipulation package.