unxip

unxip is a command-line tool designed for rapidly unarchiving Xcode XIP files and writing them to disk with good compression. Its goal is to outperform Bom (which powers xip(1) and Archive Utility) in both performance and on-disk usage, and (at the time of writing) does so by a factor of about 2-3x in time spent decompressing and about 8% in space.

Installation

The easiest way to install unxip is to grab a precompiled binary for macOS 12.0 and later from the releases page. If you prefer, you can also install unxip from your package manager: it's available on MacPorts, and Homebrew. Both will make the latest version of the command available under the package name "unxip".

Building

unxip is fairly simple and implemented as a single file. Thus, you can build it by compiling that file directly, with just an up-to-date version of the Command Line Tools (xcode-select --install):

$ swiftc -parse-as-library -O unxip.swift

This will build an optimized unxip binary for your computer's native architecture. Because unxip uses Swift Concurrency, it is recommended that you build on macOS Monterey or later; macOS Big Sur is technically supported but needs to use backdeployment libraries that are not very easy to distribute with a command line tool.

If you prefer to use Swift Package Manager to build your code, a Package.swift is also available. This has the downside of requiring a full Xcode installation to bootstrap the build, but makes it easy to build a Universal binary:

$ swift build -c release --arch arm64 --arch x86_64

When run from the project root, the resulting executable will be located at .build/apple/Products/Release/unxip.

Finally, you may also use the provided Makefile to build and install unxip:

$ make all
$ make install

The installation prefix is configurable via the PREFIX variable.

unxip is not currently designed to be embedded directly into the address space of another application. While it would "work" (with minor modifications to allow linking) its implementation expects to be the only user of the cooperative thread pool and effectively takes it over, which may adversely affect other code that wishes to run on it. The recommended way to use unxip is spawning it as a subtask.

Usage

The intended usage of unxip is with a single command line parameter that represents the path to an XIP from Apple that contains Xcode. For example:

$ unxip Xcode.xip # will produce Xcode.app in the current directory

As the tool is still somewhat rough, its error handling is not very good at the moment. An attempt has been made to at least crash preemptively when things go wrong, but you may still run into strange behavior on edge cases. For best results, ensure that the directory you are running unxip from does not contain any existing Xcode(-beta).app bundles and that you are using a modern version of macOS on a fast APFS filesystem. For simplicity, unxip does not perform any signature verification, so if authentication is important you should use another mechanism (such as a checksum) for validation.

Contributing

When making changes, be sure to use swift-format on the source:

$ swift-format -i *.swift

Design

As a purpose-built tool, unxip outperforms Bom because of several key implementation decisions. Heavy use of Swift Concurrency allows unxip to unlock parallelization opportunities that Bom largely misses, and the use of LZFSE rather than the simpler LZVN gives it higher compression ratios. To understand its design, it's important to first be familiar with the Xcode XIP format and APFS transparent compression.

XIPs, including the ones that Xcode come in, are XAR archives, which contain a table of contents that lists each file inside and the compression used for each. However, unlike most XARs Xcode's only has two files: a bzip2-compressed Metadata that is just a few hundred bytes, and a multi-gigabyte file named Content that is stored "uncompressed". While marked as plain data, this file is an apparently proprietary archive format called pbzx. Luckily, the scheme is fairly straightforward and several people on the internet have already tried reverse engineering it. This tool contains an independent implementation that nonetheless shares many of its core details with these efforts. compressed in a format documented by compression_tool(1). The compressed content inside the pbzx is an ASCII-representation cpio archive, which has been split apart into 16MB chunks that have either been individually compressed with LZMA or included as-is. Unfortunately pbzx does not contain a table of contents, or any structure aside from these (byte-, rather than file-aligned) chunks, so distinguishing individual files is not possible without decompressing the entire buffer.

Parsing this cpio archive gives the necessary information need to reconstruct an Xcode bundle, but unxip (and Bom) go through an additional step to apply transparent APFS compression to files that could benefit from it, which significantly reduces size on-disk. For this operation, unxip chooses to use the LZFSE algorithm, while Bom uses the simpler LZVN. The compressed data is stored in the file's resource fork, a special header describing the compression is constructed in an xattr, and then UF_COMPRESSED is set on the file.

On the whole, this procedure is designed to be fairly linear, with the XIP being read sequentially, producing LZMA chunks that are reassembled in order to create the cpio archive, which can then be streamed to reconstruct an Xcode bundle. Unfortunately, a naive implementation of this is process does not perform very well due to the varying performance bottlenecks of each step. To make matters worse, the size of Xcode makes it infeasible to operate with entirely in memory. To get around this problem, unxip parallelizes intermediate steps and then streams results in linear order, benefiting from much better processor utilization and allowing the file to be processed in "sliding window" fashion.

On modern processors, single-threaded LZMA decoding is limited to about ~100 MB/s; as the Xcode cpio is almost 40 GB large, this is not really fast enough for unxip. Instead, unxip carves out each chunk from the pbzx archive into its own task (the metadata in the file format makes this fairly straightforward) and decompresses each in parallel. To limit memory usage, a cap is applied to how many chunks are resident in memory at once. Since the next step (parsing the cpio) requires logical linearity, completing chunks are temporarily parked until their preceding ones complete, after which they are all yielded together. This preserves order while still providing an opportunity for multiple chunks to be decoded in parallel. In practice, this technique can decode the LZMA stream at effective speeds approaching 1 GB/s when provided with enough CPU cores.

The linear chunk stream (now decompressed into a cpio) is then parsed in sequence to extract files, directories, and their associated metadata. cpios are naturally ordered–for example, all additional hardlinks must come after the original file–but Xcode's has an additional nice property that it's been sorted so that all directories appear before the files inside of them. This allows for a sequential stream of filesystem operations to correctly produce the bundle, without running into errors with missing intermediate directories or link targets.

While simplifying the implementation, this order makes it difficult for unxip to efficiently schedule filesystem operations and transparent compression. To resolve this, a dependency graph is created for each file (directories, files, and symlinks depend on their parent directory's existence, hardlinks require their target to exist) and then the task is scheduled in parallel with those constraints applied. New file writes are particularly expensive because compression is applied before the data is written to disk. While this step is already parallelized to some extent because of the graph described earlier, there is a chance for additional parallelism in Apple's filesystem compression implementation because it chunks data internally at 64KB chunk boundaries, which we can then run in parallel. LZFSE achieves high compression ratios and has a performant implementation, which we can take advantage of largely for free. Unlike most of our steps, which were compute-bound, the final step of writing to disk requires interacting with the kernel. If we're careless we can accidentally overload the system with operations and back up our entire pipeline. To prevent unconsumed chunks sitting around in memory, we manually apply backpressure on our stream by having them only yield results when later steps are ready to consume them.

Overall, this architecture allows unxip to utilize CPU cores and dispatch disk writes fairly well. It is likely that there is still some room for improvement in its implementation, especially around the constants chosen for batch sizes and backoff intervals (some of which can probably be done much better by the runtime itself once it is ready). Ideas on how its performance can be further improved are always welcome :)