A closer to the spec implementation of ZIP parsing for Java.
The notes and structure outlines are the basis for most of LLJ-ZIP.
The JVM zip reader implementation is based off this piece.
This is a zip format reader for seekable files, that tolerates leading and trailing garbage, and tolerates having had internal offsets adjusted for leading garbage (as with Info-Zip's zip -A).
But that's not all it does. That's just what that one comment says. Some other fun quirks of the JVM zip parser:
- The end central directory entry is found by scanning from the end of the file, rather than from the beginning.
- The central directory values are authoritative. Names/values defined by the local file headers are ignored.
- The file data of local file headers is not size bound by the file header's compressed size field. Instead, it uses the central directory header's declared size.
- Class names are allowed to end in trailing
/
which most tools interpret as directories.
- Reads ZIP files using Unsafe backed mapped files.
- Using
FileChannel.map
yieldsMappedByteBuffer
which usesint
values, limiting files up to about 2GB - Our
UnsafeMappedFile
implementation useslong
which far exceeds the GB file size range
- Using
- Highly configurable, offering 3 ZIP reading strategies out of the box (See
ZipIO
for convenience calls)- Std / Forward scanning: Scans for
EndOfCentralDirectory
from the front of the file, like many other tools - Naive: Scans only for
LocalFileHeader
values from the front of the file, the fastest implementation, but obviously naive - JVM: Matches the behavior of the JVM's ZIP parser, including a number of odd edge cases. Useful for opening JAR files to mirror
java -jar <path>
behavior.
- Std / Forward scanning: Scans for
Maven dependency:
<dependency>
<groupId>software.coley</groupId>
<artifactId>lljzip</artifactId>
<version>${zipVersion}</version> <!-- See release page for latest version -->
</dependency>
Gradle dependency:
implementation group: 'software.coley', name: 'lljzip', version: zipVersion
implementation "software.coley:lljzip:${zipVersion}"
Basic usage:
// ZipIO offers a number of different utility calls for using different ZipReader implementations
ZipArchive archive = ZipIO.readJvm(path);
// Local files have the actual file data/bytes.
// These entries mirror data also declared in central directory entries.
List<LocalFileHeader> localFiles = archive.getLocalFiles();
for (LocalFileHeader localFile : localFiles) {
// Data model mirrors how a byte-buffer works.
ByteData data = localFile.getFileData();
// You can extract the data to raw byte[]
byte[] decompressed = ZipCompressions.decompress(localFile);
// Or do so with a specific decompressor implementation
byte[] decompressed = localFile.decompress(DeflateDecompressor.INSTANCE);
}
// Typically used for authoritative definitions of properties.
// Some ZIP logic will ignore properties of 'LocalFileHeader' values and use these instead.
// - Try using a hex editor to play around with this idea. Plenty of samples in the test cases to look at.
List<CentralDirectoryFileHeader> centralDirectories = archive.getCentralDirectories();
// Information about the archive and its contents.
EndOfCentralDirectory end = archive.getEnd();
For more detailed example usage see the tests.
How does each
ZipReader
implementation map to standard Java ZIP handling?
If you're looking to see which implementation models different ways of reading ZIP files in Java, here's a table for reference:
Java closest equivalent | LL-Java-Zip |
---|---|
ZipFile |
JvmZipReader / ZipIO.readJvm(...) |
ZipInputSstream |
ForwardScanZipReader / ZipIO.readStandard(...) |
N/A | NaiveLocalFileZipReader / ZipIO.readNaive(...) |
Due to some sun.misc.Unsafe
hacks (For performance and long addressing), you will get compiler warnings when first opening the project in IntelliJ.
You can resolve this by changing the compiler target: