/sqlite-wasm-http

An experimental HTTP VFS driver for SQLite WASM

Primary LanguageTypeScriptISC LicenseISC

sqlite-wasm-http

SQLite WASM with HTTP VFS

License: ISC Node.js CI codecov

This project is inspired by @phiresky/sql.js-httpvfs but uses the new official SQLite WASM distribution.

It includes a number of improvements over the first version:

  • Based upon what will probably be the industry reference (backed by SQLite and Google)

  • Supports multiple concurrent connections to the same database with shared cache

    The shared cache version uses SharedArrayBuffer which requires that the server hosting the JS code sends Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers (aka CORS 2).

  • Simplified fall-back version without support for sharing cache between worker threads that does not require SharedArrayBuffer

  • Aims to support all bundlers out-of-the-box without special configuration

You can see a live demo of the shared cache version here.

The Github Pages live demo uses the sync backend since as of February 2024 Github Pages still does not support cross-origin isolation (please, upvote).

Ony ES6 module mode is supported at the moment, CommonJS is not supported and this includes TypeScript transpiled to CommonJS - you have to transpile to ES6 in order to use this module.

You can check test/integration for examples for the various environments that are currently tested and supported.

Node.js is fully supported but requires web-worker and fetch available in Node.js 18.x+.

If you intend to use Node.js only for bundling without using SQLite in a standalone application, then the minimum required version is Node.js 16.x.

Status

Experimental

Usage

If you are not already familiar with @phiresky/sql.js-httpvfs, there is a brief presentation in the Overview section.

You can also check ol-mbtiles for an example project that uses this library - it implements remote rendering of HTTP-hosted .mbtiles in OpenLayers.

Page size

It is highly recommended to decrease your SQLite page size to 1024 bytes for maximum performance:

PRAGMA JOURNAL_MODE = DELETE;
PRAGMA page_size = 1024;
-- Do it for every FTS table you have
-- (geospatial datasets do not use full text search)
INSERT INTO ftstable(ftstable) VALUES ('optimize');
-- Reorganize database and apply changed page size
-- Sometimes you will be surprised by the new size of your DB
VACUUM;

Using the SQLite API

This method allows using the raw SQLite interface with the added support of an HTTP VFS.

import { createSQLiteThread, createHttpBackend } from 'sqlite-wasm-http';

// MBTiles is a common format for storing both vector and
// raster maps in an SQLite database
const remoteURL = 
  'https://velivole.b-cdn.net/maptiler-osm-2017-07-03-v3.6.1-europe.mbtiles';
// createHttpBackend will autodetect if you can use SharedArrayBuffer or not
const httpBackend = createHttpBackend({
  maxPageSize: 4096,    // this is the current default SQLite page size
  timeout: 10000,       // 10s
  cacheSize: 4096       // 4 MB
});
// Multiple DB workers can be created, all sharing the same backend cache
// db is a raw SQLite Promiser object as described here:
// https://sqlite.org/wasm/doc/trunk/api-worker1.md
const db = await createSQLiteThread({ http: httpBackend });
// This API is compatible with all SQLite VFS
await db('open', { filename: 'file:' + encodeURI(remoteURL), vfs: 'http' });
await db('exec', {
  sql: 'SELECT zoom_level, tile_column, tile_row, tile_data FROM tiles ' +
    'WHERE zoom_level = 10 AND tile_column = $col AND tile_row = $row',
  bind: { $col: 600, $row: 600 },
  callback: (msg) => {
    if (msg.row) {
      console.log(msg.columnNames);
      console.log(msg.row);
    } else {
      console.log('end');
    }
  }
});
// This closes the DB connection
await db('close', {});
// This terminates the SQLite worker
db.close();
await httpBackend.close();

Using the automated pool

A higher-level API allows to automatically use concurrent HTTP connections to the same SQLite database.

Unlike the previous API which is compatible with all SQLite VFS, this one works only for HTTP remote access.

const remoteURL = 
  'https://velivole.b-cdn.net/maptiler-osm-2017-07-03-v3.6.1-europe.mbtiles';
const pool = await createSQLiteHTTPPool({ workers: 8 });
await pool.open(remoteURL);
// This will automatically use a free thread from the pool
const tile = await pool.exec(
    'SELECT zoom_level, tile_column, tile_row, tile_data FROM tiles ' +
    'WHERE zoom_level = 10 AND tile_column = $col AND tile_row = $row',
    { $col: 600, $row: 600 });
console.log(tile[0].columnNames);
console.log(tile[0].row);
// This shutdowns the pool
await pool.close();

Further information

Overview

This package includes a browser-compatible version of SQLite3 compiled to WASM. SQLite3 already supports user-defined VFS - both in its original C API and in its very recent new JS API. The JS API includes an OPFS VFS driver which is the reason this project is being pushed by Google and has very good chances of becoming an industry reference. This package adds an additional HTTP VFS driver that uses HTTP Range requests - the same that are used by clients supporting resuming of failed downloads - to implement a filesystem-like random access for SQLite3.

The main drawback of SQLite3 - as it is the case of almost all C/C++ software built to WASM for the web - is that it is fully synchronous. Accordingly, the SQLite3 WASM comes with two APIs - one, fully synchronous, which works a lot like the C/C++ version, and another one - which runs SQLite3 in a Web Worker and communicates with it by message passing implemented in a Promise-based wrapper.

Currently, the builtin multithreading of the C/C++ version of SQLite3 is not enabled in the WASM version. This means that the only way to have multiple concurrent connections to one (or more) databases is to run several independent SQLite3 workers.

OPFS and HTTP further complicate that situation - as both are intrinsically asynchronous APIs. This is why the HTTP driver comes in two flavors:

  • a more modern one, that uses a dedicated HTTP worker thread to run asynchronously all HTTP operations for all SQLite3 workers and implements sharing of the cache between those
  • and a more compatible one, that runs synchronously all HTTP operations in the SQLite3 thread that invoked them and does not support cache sharing between workers

If you do not intend to run concurrent queries using multiple workers, both backends will be equivalent to you.

The driver is smart enough to select the appropriate backend according to whether SharedArrayBuffer is available or not.

Will write access ever be possible?

Short answer: Maybe, in some cases.

Long answer: It won't have the same universal support as read-only access though. There is a Content-Range header for HTTP bodies - that is used in the response of an HTTP GET request that carries a Range header. The RFC does not say anything about this header being used for PUT requests. Most web server do not support it. Apache does support it if the WebDAV extensions are enabled. Maybe other servers support in specific configurations too. Support on public infrastructure servers, especially the low-cost ones, will likely be very rare.

Performance

Prefetching and code-splitting

This module has been designed for being bundled with a modern web bundler - which should correctly identify the chunks to be shared between main and the workers.

Also, if you bundler supports preloading, preloading the WASM chunk can greatly improve the initial loading times. You should check webpack.config.cjs for an example that uses preloading.

Apache httpd configuration fragment

These are all the options required for maximum performance:

Header always append Cross-Origin-Embedder-Policy "require-corp"
Header always append Cross-Origin-Opener-Policy: "same-origin"
AddOutputFilterByType DEFLATE application/wasm

They must be set on the origin - the main entry point as it is displayed in the user's URL bar. When using an iframe, the iframe must have them as well as all of its parents up to the origin, as well as the special iframe attribute: <iframe allow="cross-origin-isolated">.

Be careful as depending on the Apache version and its default configuration, setting AddOutputFilterByType in a VirtualHost might override (instead of extending) the default compression configuration.

Developer mode

npm run start

If you open http://localhost:9000 you will have a SharedArrayBuffer environment, and if you open http://<your-ip>:9000, you will have a legacy environment. This is because a non-https environment can be cross-origin isolated only if running on localhost.

Debug output

Browser mode:

SQLITE_DEBUG=vfs,threads,cache npm run start

Node.js

SQLITE_DEBUG=vfs,threads,cache mocha

Overview

Overview

Copyright

ISC License

Copyright (c) 2023-2024, Momtchil Momtchev momtchil@momtchev.com

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.