/protos-db

A set of filesystem-based databases for iterative prototyping for Node.js

Primary LanguageJavaScript

NOTE: Extremely unstable, work in progress

What's this?

This is a filesystem-based database library I quickly wrote to assist with iterative prototyping. This is intended as a holdover for Node.js projects that don't yet implement a "real" database.

Types of databases

There are so far two types of databases:

  • Blob database
  • Time Record database (table-oriented, append-only, indexed by timestamp)

You should choose the database(s) that are most relevant to your use cases.

Immutable, addressible data is stored as "Blobs"

Each blob is a gzip-compressed string that's addressable by its SHA1 hash.

Why on earth would you want this?

  • a) To avoid persisting the same data twice - save on hard drive space. This is desirable for sparse-yet-large data that you want to normalize
  • b) To compress file contents that contain lots of repetition, such as JSON.

Usage:

const { createBlobDb } = require('protos-db')

const blobDb = createBlobDb('./path/to/my-blob-db')

blobDb.persistDataAsBlob("Hello World!").then(hash => {
  console.log(`Persisted Hash: ${hash}`)
})
// => Prints "Persisted Hash: 2ef7bde608ce5404e97d5f042f95f89f1c232871"

// later on in your program...

blobDb.readBlobDataAsString("2ef7bde608ce5404e97d5f042f95f89f1c232871").then(contents => {
  console.log(`Contents: ${contents}`)
})
// => Prints "Contents: Hello World!"

Blob data structure, tree

A blob directory might look something like this:

my-blob-db
├── 2a
│   └── ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz
├── 87
│   └── 2e18a933e6c41dc5abe6c29a38b58959c8112b.gz
└── b8
    └── dfb080bc33fb564249e34252bf143d88fc018f.gz

The database above contains three blobs:

  • 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
  • 872e18a933e6c41dc5abe6c29a38b58959c8112b
  • b8dfb080bc33fb564249e34252bf143d88fc018f

If you read the blob, you get a string:

$ zcat my-blob-db/2a/ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz
hello world
$

If you rehash the uncompressed string, you get the directory+filename back:

$ zcat my-blob-db/2a/ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz | sha1sum
2aae6c35c94fcfb415dbe95f408b9ce91ee846ed  -
$

Notice how the first two characters of the SHA1 are used for the directory names. This is inspired by Git and CDNs that do the same.

  • The first reason is for performance. Many file systems, such as Ext*, scan files in a directory linearly. I.e. they start from the top, and check one-by-one until a match is found.
  • The second reason for doing this is to stay within the file system's file limit for directories. On Ext3, this is about 32,000.
  • The last reason is for humans. If you're troubleshooting the database, you don't want to ls the directory and get spammed with tens of thousands of files up-front. By fanning out to 256 buckets, that 10,000 becomes 39.

Append-only data is stored as records in tables

All records are indexed by calendar day.

Usage:

const { createRecordDb } = require('protos-db')

// We must additionally supply a function to extract the ISO-8583 timestamp field.
// All records in all tables must have a way to extract a ISO-8583 timestamp, e.g. a field.
const recordDb = createRecordDb('./path/to/my-record-db', record => record.timestamp)

// There's new information about Marge Simpson!
recordDb.appendRecordToTable('marge-simpson-info', {
  timestamp: "2018-07-27T09:56:22Z",
  name: "Marge Simpson",
  address: "123 Fake Street",
  children: 3
}).then(() => {
  console.log(`Appended record`)
})

// later on in your program...

recordDb.readLatestRecord('marge-simpson-info').then(record => {
  console.log(`Record: ${JSON.stringify(record)}`)
})
// => Prints "Record: {"timestamp":"2018-07-27T09:56:22Z","name":"Marge Simpson","address":"123 Fake Street","children":3}"