superfly/litefs

HTTP Query API

benbjohnson opened this issue · 25 comments

One issue we've had with LiteFS is that background job nodes usually need to write to the database but they are not the primary node themselves. They also cannot utilize the LiteFS proxy since there are no HTTP requests to redirect.

Instead, it would be useful to provide an HTTP API for queries for these types of nodes. This is similar to the approach that Turso pioneered, however, it's not intended to be the primary method of data access in LiteFS. This approach will require language-specific database drivers so we will add those on an as-needed basis.

One issue we've had with LiteFS is that background job nodes usually need to write to the database but they are not the primary node themselves. They also cannot utilize the LiteFS proxy since there are no HTTP requests to redirect.

This perfectly describes my situation for my cache db, which is why this endpoint exists. I would switch to this instantly.

This approach will require language-specific database drivers so we will add those on an as-needed basis.

TypeScript/Node.js first please! 🙏

My initial thought would be this would be implemented as an extension of the http proxy already running in LiteFS, is there a reason a language-specific driver would be necessary?

My initial thought would be this would be implemented as an extension of the http proxy already running in LiteFS, is there a reason a language-specific driver would be necessary?

SQLite doesn't provide a way to proxy individual queries or transactions so we have to implement it at a higher layer (i.e. the driver) to be able to intercept queries/txns. The HTTP proxy runs at a higher layer than the driver and it doesn't work for situations where we aren't using incoming HTTP requests (e.g. background job workers).

The "driver" is going to be really thin. There's only 5 data types in SQLite so it can't get too complex. Turso supports a query API, batch API, & an interactive transaction API. We'll likely just support a query & batch API as the transaction API comes with a bunch of foot-guns. If people need a full transactional API, they should add an endpoint and use regular SQLite on LiteFS.

TypeScript/Node.js first please! 🙏

The first two drivers will probably be TypeScript & Ruby. Background jobs are very common in Ruby and then TS/JS is a very big community.

I can offer feedback on the API if you'd like to share an example of what this might look like. I'm just not sure what it'll be like to use. Is it another service I need to run? Is it config?

It'll be part of the LiteFS HTTP server that currently runs on port 20202.

It'll look something like:

POST /db/query?name=my.db

Request body:

{
    "query": "SELECT foo, bar, baz FROM t WHERE x = ? AND y = ?",
    "parameters": [
        {"value":"abc"},
        {"value":"123"}
    ]
}

Response body:

{"row":["foo1", "bar1", "baz1"}}
{"row":["foo2", "bar2", "baz2"}}
...
{"changes":0, "last_insert_rowid":0} 

The changes/last_insert_rowid only apply to DML (INSERT, UPDATE, DELETE). The response body is structured as a series of separate JSON objects so we don't OOM the process with a large query result set. It can process one row at a time instead.


The batch API may just be the same endpoint but with multiple query objects.

The actual driver code itself should have the same API as a normal database driver. It should hopefully feel the same except you'll change the driver name & it'll have higher latency.

That looks like just a regular HTTP endpoint I can hit with any language. Why do I need to have language drivers?

Yeah, it's just an HTTP endpoint. Maybe "driver" is a bad word for it. In Go, there's typically a database driver that's abstracted by the generic database/sql interface. Ruby has ActiveRecord. I guess in JavaScript it'd just be a regular old library that wraps the HTTP endpoint. It's mainly for ease of use.

Oh I see. Yeah, let me know when we get the endpoints and I can help put together the wrapper if you like.

gedw99 commented

This looks really nice. LiteFS is really getting there at making a complex race condition area blindingly simple.

i wonder if subscriptions can be supported over the http transport using SSE or WS. Only saying this because the underlying litefs system knows what has changed and so subscriptions are then possible .

Don’t want to turn it into a graphql monster . But it could just be able to track presence ( I.e that you made a query a minute ago ) and then know that user x make a query and is still connected and so here is the changes to those records that you queries a minute ago. It’s essentially a simple CDC protocol. Again NOT graphql. Someone could build graphql on top of that’s what they wanted.

The presence aspect could also just be a TTL. You make a query with a TTL x-header and so the caller is in control of how long each query acts like a subscription.

The http proxy can then just keep a lookup table that maps users to queries and their TTL.

When a record changes it then just works out who is affected users via the lookups and then calculates the difference.

How to calculate the difference though :) various approaches exist . There is probably sone underlying litefs primitives that could be employed perhaps.

Ah I was doing so well until I hit that last bit :)

@gedw99, that's a very interesting idea. I'm guessing @benbjohnson would prefer to discuss that in another issue. Would you mind opening up a new issue?

gedw99 commented

@gedw99, that's a very interesting idea. I'm guessing @benbjohnson would prefer to discuss that in another issue. Would you mind opening up a new issue?

#328

Wasn't there talk/plans about write forwarding from replicas to the leader at some point, which would also cover this particular use case? I've hit the exact same issue and am now also wondering how to proceed. For now, I'll just use a static leader and only run background jobs there… 😅

@markuswustenberg We have write forwarding in v0.4.0, however, it has some downsides that are more "distributed systems" related rather than LiteFS related. The problem is that starting a transaction remotely will lock the primary (since SQLite is single-writer) and it can lock it indefinitely (or until a timeout occurs) if the replica crashes or disconnects in an undetectable way.

The HTTP Query API is a simpler approach in that it doesn't deal with transactions. It's simply an API for sending a query (or a batch of queries) that run in a single transaction. That reduces the failure modes if the replica disconnects since the transaction is started & stopped on the primary itself.

This approach doesn't work for more complex background jobs where you need to do several queries that depend on each other in the same transaction. If you start having more complex logic like that, I'd suggest opening an API on the primary node for your background job to call instead.

Aaah, I had totally missed that write forwarding didn't happen transparently, but something that needs to be explicitly done (found https://fly.io/docs/litefs/migrations/#write-forwarding).

So I guess the biggest difference is that for the HTTP query API, I can't have some logic decide what to do depending on which state is in the database transactionally, right?

Do you plan to keep the write forwarding? I had a look at the Go client. For my use case, I think locking on the replica but with a really tight deadline would be a fine approach as well, but I guess the deadline then also depends on the distance between leader and replica.

My services are Go services, so I'd be happy to try out any Go clients. (But I'm going on paternity leave soon, so might not make it before then.)

So I guess the biggest difference is that for the HTTP query API, I can't have some logic decide what to do depending on which state is in the database transactionally, right?

Sort of. SQLite has some limited support for logic within queries (e.g. upserts). That's one option. I've been thinking about making user-defined functions available too but I'm not sure how that'd work quite yet.

Do you plan to keep the write forwarding?

Yes, we're planning on keeping it. I think there are some use cases where it makes sense. There's also some improvements we can make to it such as heartbeating to detect if the client is still connected.

@benbjohnson may I suggest you use our format for that (Which is fully documented)? The main advantage of course is that you can reuse our drivers, and don't have to write yours. Also increases compatibility.

Thanks, @glommer. I read the spec before but I can't seem to find it now. Do you have the link by chance?

My one concern with the API was that the full result set was returned as an array. It seems like that would require a lot of RAM usage by the client since JSON libraries typically parse the full root object at a time. Have you considered using an ndjson format instead so you can stream the results?

https://github.com/libsql/sqld/tree/main/docs => look for HRANA and HTTP specs. We're currently at V2.
I know there's a reason for us doing the way we do, but I can't comment on the top of my head. However, we're of course willing to evolve this with you. Having two ways of doing the same thing benefits nobody, especially for things that require client libraries.

For example, if you implement our spec, you already get Rust, Typescript, Python and golang for free. We have all of that. And if there are concerns about performance (which we obviously always want to improve), we can evolve v3 together.

I'm with @glommer here. I think it would make more sense to start with something more de facto standard, and then evolve.

Provided starting with the standard doesn't preclude the evolution to something better. Which I believe is the case here.

Yeah, I'm all for not reinventing the wheel. Thanks for the links! I'll take a look at them. I don't see any reason we can't use that protocol.