Another attempt at building a fault tolerant storage system.
This provides no guarantee of ACID compliance, no transaction isolation and such.
It can store JSON objects only.
- Primary keys will need to be globally unique, using for example uuid.NewRandom()
- JSON object contains a number of values, always start with '{' (no support for root to be anything else than an object)
- Search index keys can be specified and will be replicated globally
- Any update will be transfromed into a delta JSON and replicated as delta only (TODO)
- All updates will have a timestamp and be applied based on timestamp value
- A global log of all updates is stored and kept for a limited amount of time (TODO cleanup of old records)
- All checkpoints also have a log count value that represents the size of work so far and allows detecting missing logs
- Local data is stored in a single leveldb in order to allow consistent snapshots
Database consistency:
- Log entries sorted by nanotime timestamp
- Checkpoints will include number of known log entries
- We produce one checkpoint per 24 hours by default in order to ensure that even if a server goes down it can come back up
leveldb prefixes are fixed at 3 bytes for ease of storage.
- idx+key = id (indices)
- log+logid = log (journal, indexed by timestamp + hash)
- dat+id = data (data)
- nfo+id = version (16 bytes current record version)
- kdt+id = keys (key data, 32 bits length of key followed by key, repeated for each key)
- typ+type_name = type data
- chk+id = checkpoint
Key: log
+ timestamp(16bytes) + hash(sha256)
- We perform one checkpoint every 24 hours
- Checkpoint is rounded timestamp + number of log entries in the past 24 hours + xor of log keys
- We only keep 100 latest checkpoints (3+ months)
- We perform a snapshot within a few minutes of a checkpoint being reached
- The snapshot will include all the changes up to the checkpoint, and some more
- Upon starting from zero, a node will load the latest snapshot and download it, then apply subsequent log except for log already included in the snapshot since considered already applied
- Log entries prior to the snapshot date will not be included in the snapshot