/zipfile-deflate64

Extract Deflate64 ZIP archives with Python's `zipfile` API.

Primary LanguagePythonApache License 2.0Apache-2.0

zipfile-deflate64

PyPI

Extract Deflate64 ZIP archives with Python's zipfile API.

Installation

pip install zipfile-deflate64

Python 3.6, 3.7, 3.8, 3.9, and 3.10 are supported, with manylinux2014, macOS and Windows wheels published to PyPI.

Usage

Anywhere in a Python codebase:

import zipfile_deflate64  # This has the side effect of patching the zipfile module to support Deflate64

Alternatively, zipfile_deflate64 re-exports the zipfile API, as a convenience:

import zipfile_deflate64 as zipfile

zipfile.ZipFile(...)
...

Design Rationale

The Problem

Recent versions of Microsoft Windows Explorer use Deflate64 compression when creating ZIP files larger than 2GB. With the ubiquity of Windows and the ease of using "Sent to compressed folder", a majority of newly-created large ZIP files use Deflate64 compression.

However, support for Deflate64 in the open-source ecosystem is very poor! Most ZIP libraries have declined to implement Deflate64, citing its proprietary nature.

In the .NET ecosystem, the ZipArchive API supports decompression only. In Java, the Apache Commons Compress APIs support both compression and decompression.

The 7-Zip project probably provides the best general-purpose support for compressing and decompressing Deflate64, but there are several obstacles to general usability:

In the Python ecosystem in particular, there have been several unfulfilled requests ( [1] [2] [3] ) for Deflate64 decompression support.

A Solution

The best hope seems to be the infback9 extension to zlib. This was developed in 2003 by Mark Adler, an original author of zlib, and is kept in the source repository of zlib, but it is not officially supported and contains no build tooling and is not distributed with zlib packages. Additionally, infback9 provides only low-level support for working with Deflate64 bitstreams, with no support for the ZIP archive format (which is out of scope for zlib).

infback9's C-language API is relatively simple, but requires a non-trivial struct and function pointers for initialization and some explicit memory management operations (resizing allocated buffers and proving a Python-friendly malloc) to operate efficiently, so wrapping it with only ctypes seems to be inadequate.

To manage ZIP archive extraction operations, the Python standard library zipfile module provides the essential features and is already ubiquitous in availability and usage. However, zipfile is difficult to extend, as it hardcodes many conditionals for compression formats and does not provide capabilities for easily augmenting or replacing parts of it. Monkey-patching can overcome some of these problems, and the promise of a drop-in, API-compatible patch to a standard library module outweighed the engineering benefits of basing a solution off a more naturally extensible third-party ZIP manipulation package.