/nebula

Real-time distributed storage for files and key value databases built on top of Hypercore Protocol

Primary LanguageJavaScriptGNU Affero General Public License v3.0AGPL-3.0

Nebula

Build Status

Nebula drives are real-time distributed storage for files and key value databases built on top of Hypercore Protocol. This project exists because the Telios email client needed a way to distribute and store encrypted emails on user's local file systems over a peer-to-peer (P2P) network. A lot of inspiration was taken from Hyperdrive, but Hyperdrive didn't have options for fine-grain access control, multiple writers, and the ability to delete files from disk once added to the drives.

Nebula drives come with a handful of useful features like:

  • Shareable over company firewalls and mobile networks: The P2P network runs on Hyperswarm which has the ability to hole-punch through most company firewalls and mobile connections.
  • Full Text Search: Create encrypted full text search indexes on top of Hypercores.
  • Access Control: Control access to each file by sharing the file's hash and the drive's discovery key.
  • Multiwriter: Drives can have multiple peers with write access by exchanging eachother's keys.
  • Collections: Along with files, drives can create and share simple key value btree databases built on Hyperbee. Collections also have the option to be encrypted with a secret key.

TODOs:

  • Connect to drives behind corporate firewalls and mobile networks
  • Create and share key value databases between peers
  • Upgrade multiwriter to Hypercore v10
  • Build full text search indexes from encrypted Hypercores
  • Upgrade collections API with Hyperbeedeebee
  • Upgrade access control to limit sharing by a peer's public key
  • Turn an existing directory into a drive and watch for changes

Installation

npm i @telios/nebula

Usage

/******************************************************
 * 
 * Create a new drive and write encrypted files to it
 * 
 ******************************************************/

const Drive = require('@telios/nebula')

// Optionally pass in an encryption key to encrypt the drive's databases
const encryptionKey = Buffer.alloc(32, 'hello world')

const localDrive = new Drive(__dirname + "/drive", null, { 
  keyPair,
  encryptionKey,
  swarmOpts: {
    server: true,
    client: true
  },
  fullTextSearch: true
})

await localDrive.ready()

// Key to be shared with other devices or services that want to seed this drive
const drivePubKey = localDrive.publicKey

// Clone a remote drive
const remoteDrive = new Drive(__dirname + "/drive_remote", drivePubKey, { 
  keyPair,
  swarmOpts: {
    server: true,
    client: true
  }
})

await remoteDrive.ready()

localDrive.on('file-sync', file => {
  // Local drive has synced somefile.json from remote drive
})

// Write a non-encrypted file to the drive
await remoteDrive.writeFile('/dest/path/on/drive/somefile.json', readableStream)

// Write anencrypted file to the drive
await remoteDrive.writeFile('/dest/path/on/drive/someEncryptedFile.json', readableStream, { encrypted: true })


/******************************************************************
 * 
 * Create an encrypted and shared database with full text search
 * 
 *****************************************************************/

const corpus = [
  {
    title: 'Painting 1',
    text_body: "In your world you can create anything you desire."
  },
  {
    title: 'Painting 2',
    text_body: "I thought today we would make a happy little stream that's just running through the woods here."
  },
  {
    title: 'Painting 3',
    text_body: "See. We take the corner of the brush and let it play back-and-forth. No pressure. Just relax and watch it happen."
  },
  {
    title: 'Painting 4',
    text_body: "Just go back and put one little more happy tree in there. Without washing the brush, I'm gonna go right into some Van Dyke Brown."
  },
  {
    title: 'Painting 5',
    text_body: "Trees get lonely too, so we'll give him a little friend. If what you're doing doesn't make you happy - you're doing the wrong thing."
  },
  {
    title: 'Painting 6',
    text_body: "Son of a gun. We're not trying to teach you a thing to copy. We're just here to teach you a technique, then let you loose into the world."
  }
]

const collection = await drive.db.collection('BobRoss')

for(const data of corpus) {
  await collection.insert({ title: data.title, text_body: data.text_body })
}

// Build a search index from the title and text_body properties
await collection.ftsIndex(['title', 'text_body'])

// Query the index
const query = await collection.search("happy tree", { limit: 10 })

API / Examples

const drive = new Drive(storagePath, [key], [options])

Create a drive to be shared over the network which can be replicated and seeded by other peers.

  • storagePath: The directory where you want the drive to be created.
  • key: The public key of the remote drive you want to clone

Options include:

{
  storage, // Override Hypercore's default random-access-file storage with a different random-access-storage module
  storageMaxBytes, // Maximum bytes the drive will store before turning off replication/file syncing
  encryptionKey,  // optionally pass an encryption key to encrypt the drive's database
  keyPair: { // ed25519 keypair
    publicKey, 
    secretKey
  },
  syncFiles: true | false // Sync all files from peer drives
  joinSwarm: true | false // Optionally set whether or not to join hyperswarm when starting the drive. Defaults to true.
  swarmOpts: { // Set server to true to start this drive as a server and announce its public key to the network
    server: true | false,
    client: true | false
  },
  checkNetworkStatus: true | false // Listen for when the drive's network status changes
  fulltextSearch: true | false // support full text search indexes
  blind: true // Initialize drive as a blind seeder. For example when you're seeding another encrypted drive and you don't have the encryption key
  broadcast: true // Tell other peers about this drive. Defaults to true.
}
const Drive = require('nebula-drive')

// Create a new local drive.
const localDrive = new Drive(__dirname + "/drive", null, { 
  keyPair,
  swarmOpts: {
    server: true,
    client: true
  }
})

await localDrive.ready()

// Key to be shared with other devices or services that want to seed this drive
const drivePubKey = localDrive.publicKey

// Clone a remote drive
const remoteDrive = new Drive(__dirname + "/drive_remote", drivePubKey, { 
  keyPair,
  swarmOpts: {
    server: true,
    client: true
  }
})

await remoteDrive.ready()

drive.publicKey

The drive's publicKey should ONLY be given to remote peers you would like to replicate with. All peers with this key will have write access to your drive's metadata (peers this drive connects to and generic file data). The publicKey + the drive's encryption key will give peers full write access.

drive.peerWriterKey

The key of this drive's primary writer Hypercore. This key can be used for adding and removing this drive as a remote peer.

drive.peers

A Set containing the public keys of all connected peers.

drive.encryptionKey

The drive's symmetric key used for encrypting the writer hypercores. Share this key with trusted peers only and as they will have full write access.

await drive.ready()

Initialize the drive and all resources needed.

await drive.addPeer(peer)

Adds a remote drive as a new writer. After a peer has been added, the drive will automatically try to reconnect to this peer after every restart. Drives automatically add remote peers that initialized their drives with a public key so this rarely needs to be manually called.

  • peer
    • blind: true | false drive.blind
    • publicKey: The remote peer's public key drive.publicKey
    • writer: The remote peer's writer core public key drive.peerWriter
    • meta: The remote peer's meta core public key drive.publicKey

Example Usage:

// Local drive on Device A
const drive1 = new Drive(__dirname + "/drive", null, { 
  keyPair,
  swarmOpts: {
    server: true,
    client: true
  }
})

// Local drive on Device B
const drive2 = new Drive(__dirname + "/drive", drive1.publicKey, { 
  keyPair,
  swarmOpts: {
    server: true,
    client: true
  }
})

// Writer and meta keys need to be exchanged between both devices for bi-directional writing.
// Drive2 already includes drive1 as a peer because it was initialized with drive1's publicKey
await drive1.addPeer({
  blind: drive2.blind,
  publicKey: drive2.publickey,
  writer: drive2.peerWriter,
  meta: drive2.publicKey
})

await drive.removePeer(peer)

Stop replicating with another drive peer.

  • peer
    • blind: true | false drive.blind
    • publicKey: The remote peer's public key drive.publicKey
    • writer: The remote peer's writer core public key drive.peerWriter
    • meta: The remote peer's meta core public key drive.publicKey
await drive1.removePeer({
  blind: drive1.blind,
  publicKey: drive1.publickey,
  writer: drive1.peerWriter,
  meta: drive1.publicKey
})

const file = await drive.writeFile(path, readableStream, [opts])

Write a file from a readable stream. When choosing to encrypt a file, the encryption key will be passed back in the response. Each file is encrypted with a unique key which should be stored separately.

  • path: Full path where the file resides on the local drive dir/to/my/file.jpg
  • readableStream: Any readableStream fs.createReadableStream()

Options include:

// When encrypted is true a key and header value will be returned after the file has been written
{ 
  encrypted: true 
}

const stream = await drive.readFile(path)

Creates a readable stream of data from the requested file path.

  • path: Full path where the file resides on the local drive dir/to/my/file.jpg

const stream = await drive.fetchFileByDriveHash(discoveryKey, fileHash, [opts])

Drives with many files may not want to announce every file by it's hash due to network bandwidth limits. In this case, a drive has the option of sharing it's discoveryKey which peers can use to connect to the drive and then make a request file hash request.

  • discoveryKey: Remote drive's discovery key drive.discoveryKey which is used by peers to request resources from the drive.
  • fileHash: Hash of the file being requested on the remote drive.
  • opts: If a key and header are passed in then the return stream will be the deciphered data
    • key: Encryption key used for deciphering the encrypted stream. This key is returned from the drive.writeFile method.
    • header: Needed for validating the encrypted stream. This gets returned from drive.writeFile().

const stream = drive.decryptFileStream(stream, key, header)

If drive.fetchFileByDriveHash is returning encrypted data, then decryptFileStream will transform that stream and return a new stream of deciphered data.

  • stream: Readable stream of encrypted data
  • key: Encryption key used for deciphering the encrypted stream. This key is returned from the drive.writeFile method.
  • header: Needed for validating the encrypted stream. This gets returned from drive.writeFile().

await drive.fetchFileBatch(files, cb)

Fetching files as a batch automatically chunks parallel requests in a fixed batch size so a drive can request as many files as it needs without impacting performance.

  • files: Array of file objects with the following structure
    • discovery_key: Remote drive's discovery key drive.discoveryKey which is used by peers to request resources from the drive.
    • hash: Hash of the file being requested on the remote drive.
    • key: Encryption key used for deciphering the encrypted stream. This key is returned from the drive.writeFile method.
    • header: Needed for validating the encrypted stream. This gets returned from drive.writeFile().
  • cb: Callback method that runs after every file stream has been initialized. Use this for handling what to do with the individual file streams. Note that this should return a promise.

Example Usage:

await drive.fetchFileBatch(files, (stream, file) => {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(`./${file.path}`)
    pump(stream, writeStream, (err) => {
      resolve()
    })
  })
})

await drive.close()

Fully close the drive and all of it's resources.

await drive.stat()

Returns the drive's storage info

{
  file_bytes: 0,
  core_bytes: 0,
  total_bytes: 0
}

drive.on('message', (peerPubKey, socket) => {})

Emitted when the drive has recieved a message from a peer.

  • peerPubKey: Public key of the peer that sent the message
  • socket: The socket returned on this event can be used as a duplex stream for bi-directional communication with the connecting peer. socket.write socket.on('data, data => {})

drive.on('collection-update', (item) => {})

Emitted when a collection has received an update from a remote peer

  • item
    • collection: The collection that was updated
    • value: JSON value of the new update

drive.on('file-add', (file, enc) => {})

Emitted when a new file has been added to a local drive.

  • file: A file object
    • path: drive path the file was saved to
    • hash: Hash of the file
  • enc: Passes back properties needed to decrypt the file
    • key: Key needed to decrypt the file
    • header: Needed for validating the encrypted stream

drive.on('sync', () => {})

Emitted when the drive has synced remote data.

drive.on('file-sync', (file) => {})

Emitted when the drive has synced remote a remote file.

drive.on('file-unlink', (file) => {})

Emitted when a file has been deleted on the drive.

drive.on('fetch-error', (err) => {})

Emitted when there has been an error downloading from the remote drive.

drive.on('network-updated', (network) => {})

Emitted when either the internet connection or the drive's connection to Hyperswarm has changed. The drive option checkNetworkStatus must be set to true in order for these events to be emitted.

Returns:

  • network
    • internet: true|false
    • drive: true|false

drive.on('peer-connected', (peer) => {})

Emitted when a remote peer connects and starts replicating.

drive.on('peer-disconnected', (peer) => {})

Emitted when a remote peer disconnects.

Drive Database API

Drive databases mimic the MongoDB API. Full API documentation can be found on Hyperbeedeebee

const collection = await drive.db.collection(name)

Creates a new collection. Collections are automatically encrypted when a drive is instantiated with encryptionKey (drive.encryptionKey)

await collection.insert(doc)

Inserts a new document into the collection.

Example:

const doc = await collection.insert({ name: 'alice', age: 37 })

// doc._id gets set to an ObjectId if you don't specify it

await collection.find(query)

Search through all documents by query

Example:

const docs = await collection.find({ name: 'alice' })

await collection.createIndex(fields, [opts])

Creates an index for a set of fields. This will speed up queries and is required for sorting by fields

Example:

const docs = await collection.createIndex(['name','address'])

await collection.ftsIndex([prop1, prop2, ...], docs)

Create a full text search index from a collection's properties

const docs = []

// Documents can only be added to search indexes after they've been inserted into a collection.
const doc = await collection.insert(item)

docs.push(doc)

await collection.ftsIndex(['address', 'first_name', 'last_name'], docs)

const results = await collection.search(query, [opts])

Query a searchable index

Options include:

{ 
  limit: 10 
}