This repository contains a simple versioned key value store library implemented in Clojure. The library is loosely modeled after Irmin, a distributed database built on the same principles as Git implemented in OCaml.
Core high level functions for interacting with the repository. Currently we can:
- Initialize a repository using different storage engines
- Commit new values and updates to existing ones
- Get the most recent value by key
- List all keys
Future developments might include: removing values from the most recent revision, history listing, branching and merging. Another limitation of the current implementation is that all blobs (values) are in the root tree. If support for trees that contain trees is added, we can support hierarchical keys.
A full example how to use the library can be found in the
tursas.example
namespace.
;; Commit a new value to the repository
(tursas/commit repo "master" "foo" {:a 1 :b 2})
;; Get value of a key from the repository
(tursas/get-value repo "master" "foo")
;; Returns {:a 1 :b 2}
;; List all keys
(tursas/get-keys repo "master")
;; Returns ("foo")
An example screenshot of history accumulated in a Git loose object repository after committing a few updates:
Storage API is defined using multimethods in namespace tursas.storage.api. Various storage engines can be added by making suitable implementation of the multimethods in this API namespace. In this repository there are two implementations: an in-memory store using Clojure data structures and a Git bare repository using loose object format. The implementation can be found in the impl subdirectory.
The namespace tursas.storage.impl.in-memory-map
contains an in-memory implementation of the repository
storage engine. It uses an STM ref that contains the mutable object
store. The store contains a map for Git objects and another map
structure for storing refs (such as branches).
This storage implementation uses the same Git object serialization based hashing as the Git loose object store so their object hashes should be identical. The values themselves are stored as plain old Clojure data structures without serializing them to byte arrays. This allows the repository structure to be inspected a bit more easily.
The namespace tursas.storage.impl.loose-object
contains an object store implementation using Git loose objects in the bare repository
format. In the bare format the repository contains is only the Git data store without
any checked out files.
In the loose object format the repository objects are stored in files under the objects directory. The objects are stored in a format that contains a header describing object type and content length, then a NULL byte and the object contents after that. The actual parsing part of the Git object formats is implemented in the tursas.hash-utils namespace.
The hashes are calculated using the serialized format of the Git objects. After hashing, the object contents are compressed using the deflate algorithm and stored under the objects directory in subdirectories whose names are the first two characters of the content hash and file names consist of the rest of the hash.
The loose object format bare repository created using this storage
engine is compatible with the Git command line tools and at least some
history visualization tools (such as gitk
). If the repository grows too large Git
may automagically upgrade the format to use pack files which are not
yet supported by this version of the storage engine.