/LL-Java-Zip

Lower level ZIP support for Java

Primary LanguageJavaMIT LicenseMIT

LLJ-ZIP

A closer to the spec implementation of ZIP parsing for Java.

Relevant ZIP information

Official spec

The notes and structure outlines are the basis for most of LLJ-ZIP.

JVM zip parsing & JLI

The JVM zip reader implementation is based off this piece.

This is a zip format reader for seekable files, that tolerates leading and trailing garbage, and tolerates having had internal offsets adjusted for leading garbage (as with Info-Zip's zip -A).

But that's not all it does. That's just what that one comment says. Some other fun quirks of the JVM zip parser:

  • The end central directory entry is found by scanning from the end of the file, rather than from the beginning.
  • The central directory values are authoritative. Names/values defined by the local file headers are ignored.
  • The file data of local file headers is not size bound by the file header's compressed size field. Instead, it uses the central directory header's declared size.
  • Class names are allowed to end in trailing / which most tools interpret as directories.

Additional features

  • Reads ZIP files using Unsafe backed mapped files.
    • Using FileChannel.map yields MappedByteBuffer which uses int values, limiting files up to about 2GB
    • Our UnsafeMappedFile implementation uses long which far exceeds the GB file size range
  • Highly configurable, offering 3 ZIP reading strategies out of the box (See ZipIO for convenience calls)
    • Std / Forward scanning: Scans for EndOfCentralDirectory from the front of the file, like many other tools
    • Naive: Scans only for LocalFileHeader values from the front of the file, the fastest implementation, but obviously naive
    • JVM: Matches the behavior of the JVM's ZIP parser, including a number of odd edge cases. Useful for opening JAR files to mirror java -jar <path> behavior.

Usage

Maven dependency:

<dependency>
    <groupId>software.coley</groupId>
    <artifactId>lljzip</artifactId>
    <version>${zipVersion}</version> <!-- See release page for latest version -->
</dependency>

Gradle dependency:

implementation group: 'software.coley', name: 'lljzip', version: zipVersion
implementation "software.coley:lljzip:${zipVersion}"

Basic usage:

// ZipIO offers a number of different utility calls for using different ZipReader implementations
ZipArchive archive = ZipIO.readJvm(path);

// Local files have the actual file data/bytes.
// These entries mirror data also declared in central directory entries.
List<LocalFileHeader> localFiles = archive.getLocalFiles();
for (LocalFileHeader localFile : localFiles) {
    // Data model mirrors how a byte-buffer works.
    ByteData data = localFile.getFileData();
    
    // You can extract the data to raw byte[]
    byte[] decompressed = ZipCompressions.decompress(localFile);
    
    // Or do so with a specific decompressor implementation
    byte[] decompressed = localFile.decompress(DeflateDecompressor.INSTANCE);
}

// Typically used for authoritative definitions of properties.
// Some ZIP logic will ignore properties of 'LocalFileHeader' values and use these instead.
//  - Try using a hex editor to play around with this idea. Plenty of samples in the test cases to look at.
List<CentralDirectoryFileHeader> centralDirectories = archive.getCentralDirectories();

// Information about the archive and its contents.
EndOfCentralDirectory end = archive.getEnd();

For more detailed example usage see the tests.

How does each ZipReader implementation map to standard Java ZIP handling?

If you're looking to see which implementation models different ways of reading ZIP files in Java, here's a table for reference:

Java closest equivalent LL-Java-Zip
ZipFile JvmZipReader / ZipIO.readJvm(...)
ZipInputSstream ForwardScanZipReader / ZipIO.readStandard(...)
N/A NaiveLocalFileZipReader / ZipIO.readNaive(...)

Building

Due to some sun.misc.Unsafe hacks (For performance and long addressing), you will get compiler warnings when first opening the project in IntelliJ. You can resolve this by changing the compiler target:

intellij compiler settings