cake 🍰

Did you know that across 19,000 models in the hugginface text-gen category, there is 15.5% of duplication in weights? This results in approximately 43 Terabytes of redundantly stored weights. Arvix paper with the full result coming soon™️.

cake is an more efficient way to download and store Machine Learning models from 🤗 Hugging Face. Think of it as 🐋 docker, but for ML models.

Leveraging the hugginface/safetensors format, it enables:

Parallelising downloads of multiple layers at the same time.
Robustness against network failures. cake caches each layer to disk, so halting half-way and retrying will not re-download already downloaded layers.
Deduplication of layers based on their contents, even across different models. If you download Mistral-7B-v0.1 followed by a fine-tune of it which only modified the top two layers, then cake will only download the top two layers.

Roadmap

Setup linting in CI
Setup local storage based on layer hashes
On push to main, build the executable and create a release
Make CLI arguments easier to use for download (example: cake download foo instead of cake download --model-id foo)
Setup config and allow overriding of storage folder, registry URL, etc
Setup a public facing instance of the hashes registry

Installation

Currently cake can only be built from source. Pre-built binaries coming soon™️.

Usage

cake help to view how to use it.

cake download <MODEL_ID> to download a model to a folder relative to cake called download (config coming soon™️).

Example: cake download KoboldAI/fairseq-dense-1.3B will download this model: https://huggingface.co/KoboldAI/fairseq-dense-1.3B from the main branch.

Contributing

cake at this time is a personal project of mine with two main aims:

Introducing better tooling into ML workflows
Learning the rust programming language

Contributions targetting either of the above are appreciated and will be reviewed on a best-effort basis.

The idea behind cake

Given a model name (example: Mistral-7B-OpenOrca):

Extract the layer hashes for the model
Check if all the layers are stored locally
Create a diff of the layers available locally and the layers required
For each layer required:
1. Pull only the layers required from the remote storage [1]
2. Compress it for local storage
Once all layers are available, export a new full model file

Potential issues

As marked with [1], the "remote storage" is not fully figured out yet. Docker has the idea of a registry that could also work here. Using the Range HTTP header has allowed us to pull only specific layers from Huggingface so far.

Example curl: curl --range 262175808-379616319 -L https://huggingface.co/KoboldAI/fairseq-dense-1.3B/resolve/main/model.safetensors\?download\=true -o model.safetensors