jvail/spl.js

Running spl.js in a shared worker

Closed this issue · 15 comments

duvifn commented

Hello @jvail,
This is a feature request.
I have a use case where it would be useful to access a db from a different iframe/window.

Is it possible to make spl.js running from a SharedWorker rather than a dedicated worker?

Thank you very much!

jvail commented

Hello @duvifn,

I think I have tried to disable almost everything that is related to threading/concurrency to avoid any headaches for the compilation. And I am afraid I am not up-to-date with all the developments in emscripten etc to support threads and so forth. I am not sure if in your use case having a SharedWorker necessarily involves a shared db connection? I think a starting point could be to look into SQLite's own WebAssembly port or SQL.js to see if it is possible in SQLite. If so then it may well be possible with SpatiaLite.

duvifn commented

Hi, thanks for your reply.

Hope I correctly understood you.
I'm not looking for a shared connection/threading in the WebAssembly level.
Instead I'm looking for something equivalent to an sqlite single process running on a server which accept requests (similar to what is done now in spl.js with a dedicated worker).
So, there is only one process that actually access the DB, but it accepts requests from many clients, instead of one.

duvifn commented

For my use case it's fine that there would be only one connection per db.

jvail commented

I have to confess I am not familiar with the SharedWorker API. So if all reads/writes are sequential then from the point of view of the db worker it should not be a big difference. Did you try to just replace Worker with a SharedWorker instance (I'd guess it requires a few adjustments). If you would create a test branch I could maybe help you a bit.

duvifn commented

Did you try to just replace Worker with a SharedWorker instance (I'd guess it requires a few adjustments). If you would create a test branch I could maybe help you a bit.

Thanks.
I created this branch with experimental SharedWorker support. It avoids dealing with any complex concurrency problems, by simply creating a db connection per user port. This is a WIP and I wrote only a few tests for it.

Note that by default sqlite supports any number of concurrent read transactions but only one write transaction (in the default mode). In case of more than one writing transaction sqlite would fail with SQLITE_BUSY. The user can provide a parameter for busy timeout to make sqlite waiting for a given time before failing with that error.
By default, if the user doesn't use explicit transaction (denoted with BEGIN... and COMMIT), this is not a problem at all since JS execution model guarantees to have only one transaction at time. In my implementation, in a case of explicit transaction it's up to the user to manage write transactions or to supply a reasonable busyTimeout (Edit: I suspect supporting timeout would force me to add ASYNCIFY handling, so I will leave it for now) and to handle errors caused by concurrent write accesses to a db.

Another limitation is that currently, extensions should be defined on worker creation only, and can't be updated after that (I didn't want to change current code model, but it could be changed relatively easily).
That means that only the first page to initialize the shared worker can create extensions.
If you have any comments I would be glad to hear.

Thank you very much!

jvail commented

Thanks for sharing the code @duvifn. Please give me a bit of time to read it and go through the SharedWorker docu essentials.

jvail commented

Hi @duvifn,

I read the SharedWorker documentation and looked at a few examples. My first impression was that this quite interesting and should not be too difficult to just replace the dedicated Worker with the SharedWorker (maybe supported by an optional setting like shared: true) but then I realized that it may be impossible without a major rework because the ObjectURL for the worker script can not be shared (meaning every context creates it's own URL). Is this "magic" code related to the problem? Is it a work around and did you test if it works a well with multiple browser taps that share the worker/database?

duvifn commented

@jvail Thanks for reviewing this.
You're right about the ObjectURL which its guid is unique, it is not predictable, and its lifetime is tied to the lifetime of the page that creates it. That why I'm using base64 data uri here.
The MDN documentation about the name attribute is a bit missing and it is also a bit misleading.

The spec says:

If all of the following conditions are true:

  • worker storage key equals outside storage key;
  • scope's closing flag is false;
  • scope's constructor url equals urlRecord; and
  • scope's name equals the value of option's name member
    then:
    Set worker global scope to scope.

So if the worker URL and the name attribute are equal then the same worker would be used.
The magic you linked above is used to calculate the name attribute if this is not given by a user. It's an optional attribute and in my implementation it serves as a guard against usage of different versions of spl.js by different pages (in such a case, an undebuggable problems would probably occur). The digest is unique for a binary version since it's computed on the worker source and the wasm binary.

I have tried this with pages of the same origin and it works.
I also added today an option to load the worker script from a custom URL if the user prefer this for whatever reason.

jvail commented

Thank you @duvifn,

You're right about the ObjectURL which its guid is unique, it is not predictable, and its lifetime is tied to the lifetime of the page that creates it. That why I'm using base64 data uri here.

I gave it a try with a simple example and indeed with a data uri it seems to work although on stack overflow someone complained it would set the secure context to false... It seems it's not the case.

This is all very interesting but a bit hard to digest. But the reason may well be that I have not read my own code for quite some time ;)

However a few questions:

  1. You have the WeakMap per MessagePort. Does this mean that every "tab" has its own set of databases and they can not access the dbs of a another "tab"?
  2. Do we need something else as message id, something random rather than a number?
  3. Would it be possible to demonstrate it in a simple example (maybe just a slightly modified copy of an existing one) with opening multiple taps to share a db - if that is intended (1.)
duvifn commented

@jvail

although on stack overflow someone complained it would set the secure context to false

Can you share a link? I'm not sure what does this mean in the context of a SharedWorker.

You have the WeakMap per MessagePort. Does this mean that every "tab" has its own set of databases and they can not access the dbs of a another "tab"?

No. This means that every browsing context (windows, iframes, workers, etc...) has its own set of database connections. Different browsing context can access any db managed by this worker, according to the default behavior of sqlite (detailed in my comment above).

Do we need something else as message id, something random rather than a number?

No. Every MessagePort uses its own set of message id which defined by the browsing context that post a message to the worker.

Would it be possible to demonstrate it in a simple example (maybe just a slightly modified copy of an existing one) with opening multiple taps to share a db - if that is intended (1.)

A very simple example (put the two files on a server, and import spl according to your server configuration):

File1 index.html:

<!DOCTYPE html>
<html>
<head>
<title>Test</title>
</head>
<body>
<script type="module">
  import SPL...;
  const spl = await SPL(undefined, {sharedWorker: true});
  const db = spl.db("/test");
  let script = `
        SELECT InitSpatialMetaData(1);
        CREATE TABLE tbl (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT);
        SELECT AddGeometryColumn('tbl', 'geometry', 4326, 'GEOMETRY', 'XY');
        `;
    const geom = `{"type":"Polygon", "coordinates":[[[35.021251,48.4648823],[35.0212088,48.4648072],[35.0220692,48.4645943],[35.0221524,48.4647422],[35.022051,48.4647672],[35.0220102,48.4646946],[35.0216271,48.4647893],[35.021251,48.4648823]]]}`;
    script += `\nINSERT INTO tbl (geometry) VALUES (SetSRID(GeomFromGeoJSON('${geom}'), 4326));`;
    
    
  await db.read(script);
</script>
<iframe src="iframe.html"></iframe>
</body>
</html>

File 2 iframe.html

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>New Frame</title>
</head>
<body>
<script type="module">
  import SPL...;
  (async () => {
   // Wait 5 seconds
    await new Promise((resolve)=> setTimeout(resolve, 5000));
    const spl = await SPL(undefined, {sharedWorker: true});
    const db = spl.db("/test");
    console.log(await db.exec(`SELECT * FROM tbl;`).get.first);
  })();
</script>
</body>
</html>
jvail commented

although on stack overflow someone complained it would set the secure context to false

Can you share a link? I'm not sure what does this mean in the context of a SharedWorker.

https://stackoverflow.com/questions/73098559/data-url-web-worker-loses-security-context

You have the WeakMap per MessagePort. Does this mean that every "tab" has its own set of databases and they can not access the dbs of a another "tab"?

No. This means that every browsing context (windows, iframes, workers, etc...) has its own set of database connections. Different browsing context can access any db managed by this worker, according to the default behavior of sqlite (detailed in my comment above).

Ok, I need to play a bit with your example. I thought we could maybe make all databases available to all tabs/contexts without actually knowing the path.

A very simple example (put the two files on a server, and import spl according to your server configuration):

Thank you. I'll continue from there. I always need examples to make up my mind.

jvail commented

Hi @duvifn,

apologies for the late reply: I think this is a very interesting feature and thank you for sharing your code. But I came to the conclusion that I would like to wait a little bit for various reasons:

  • I am not entirely sure if the feature will work reliably in any case with a shared database especially with multiple writes, open transactions etc. This might be a relevant post at emscripten emscripten-core/emscripten#15070 about file locking.
  • The new WASMFS for emscripten looks quite interesting so maybe we should wait for it?
  • I have seen that a new SpatiaLite version is about to arrive and next to updating all dependencies I'd like to do several other things in the code: Mainly cleaning up some of my slightly spaghettiesque code ;)
  • Could you maintain your feature within a fork and then we re-evaluate after the locking issue in emscripten is clearer and I have done a least some of my homework (I have a bit of trouble to allocate the necessary time currently)?

P.S.: Forgot to mention that as part of the little refactoring I could also split the single file into client, worker and wasm file. I'd guess this would also make implementing a SharedWoker abit easier. If you have other ideas then let me know.

duvifn commented

@jvail thanks.

But I came to the conclusion that I would like to wait a little bit for various reasons

I respect that.

I am not entirely sure if the feature will work reliable in any case with a shared database especially with multiple writes, open transactions etc. This might be a relevant post at emscripten

In my implementation multiple writes should work with implicit transactions because JS event driven architecture guarantees that only one transaction happens at a given time.
With explicit transaction it's up to the user to manage this but in case of mistake sqilte should throw an error. I have a test that ensures exactly this.

Could you maintain your feature within a fork

I continue to push commits with improvements as I need (as you can see in my branch). I actually use this branch so any issue is fixed in this branch.

Salut @duvifn,

not sure if you are still interested in working on the sharedworker idea... However, I have finally managed to put together a release so I can start thinking about the next ideas towards a version 1.0:

  • no bundling of the wasm file (share same wasm between node and browser) and worker script
  • only optionally link rttopo at runtime, dynamically if possible - then it could be GPL and LGPL licensed
  • use OPFS - here is a very interesting reading https://sqlite.org/wasm/doc/trunk/persistence.md which is also related to sharedworker and concurrent read/write access
  • use jsdoc everywhere (I managed already to get rid of ts - which I find increasingly annoying)
  • certainly some refactoring, especially in the worker code

Let me know what the status of your project is and if you are still working on it. Unfortunately I can not promise to move on much quicker... it will all take time.

Closing for now ...