superfly/litefs

Object Storage

benbjohnson opened this issue · 20 comments

Many applications need to store and propagate files that are not SQLite databases (e.g. images). While it's possible to store large binary data in SQLite, it is not efficient. A better approach would be to support non-SQLite files on their own.

Files can be bundled using the same LTX format, however, they will contain a single page of all the data in the file. Files must be saved in their entirety and atomically. For files written via FUSE, this corresponds to when a file handle is closed. We may also want to provide a safer atomic HTTP API as the FUSE approach could still have half-written files if a process dies while writing.

still have half-written files if a process dies while writing.

Very good issue I hadn't considered. Do you suppose it would be possible to detect this situation and auto-delete the half-written file?

We could require fsync() to finalize the file. That would be a good way to detect a fully written file and then we could discard partial files on close.

I'm surprised this is within the scope of LiteFS. I'm trying to understand the use case. Why would you use LiteFS for this, and not use something like S3 directly?

I don't so much care what tech is behind object storage (provided it works well). I'm more interested in limiting the number of service providers I'm using. So if fly says they can host files for me then I'm in.

gedw99 commented

this is a great idea..

Ue cases :

  1. I need to store and process images, and i need that URI or CID to that image to be stored in s3 / Minio so i can relate the image to some data in sqlite.

Running minio and litefs together is a great combo.

  • no lockin.
  • can run on fly plus others.
  • BUT your doing the Ops yourself, but thats not too hard with minio. It now does master master replication between minio servers and failed drives are auto removed until a new one is added and then it self restores. It's pretty much hands off these days. https://blog.min.io/minio-replication-best-practices/

You can mount volumes and map that to minio. You need a minimum of 4 drives for proper redundancy with minio, ad so thats 4 volumes.
I do this on Hetzner as its cheap and they have 4 data centers spreads globally now.

On fly i think you can also mount volumes with the new machines system ?
Can you resize a volume on Fly without disconnecting it ? will that be Ok with minio ?

https://docs.hetzner.com/cloud/volumes/faq

https://fly.io/docs/app-guides/minio/

  • can resize up only: https://fly.io/docs/reference/volumes/#extend-a-volume
  • there is no underlying SAN replication, so need to mount 4 volumes per machine running minio.
  • dont need to extend the file system after a volume resize.
  • But MUST restart fly machine...
  • max: ?
  • price for storage: 1 TB = 20 euro / month ( so 80 euro / month for minio) of provisioned capacity: https://fly.io/docs/about/pricing/
  • price for network inbound / outbound: 100GB per month free | $0.02 per GB
  • bandwith: free up to ?...

@kentcdodds I can definitely understand that perspective. I certainly like simplifying my stack, which is what drove me to SQLite/LiteFS in the first place.

Mine is that I trust AWS S3 immensely with my data, perhaps more so than any other provider. So my optimal use case is LiteFS + backup to S3, and then just pure S3 for object storage. So for something like that, having a LiteFS layer in between seems redundant and just complicating matters. But I guess I'm missing the whole backstory here.

@gedw99 I get what you're saying, but I'm personally not interested in running something like Minio myself. Too much operational complexity for my taste. (It's easy until something goes wrong, IMO. 😉)

So I guess I'm interested in an elaboration of the core issue, and what advantages this would bring compared to using S3 (or something with an S3-like API) directly?

gedw99 commented

@markuswustenberg Been running minio for ages. when you getting into high TB it saves you a lot of money.

if you not using high TB then its pointless, and you might as well just run on someone else's S3.

But fly does not have one so then what is the solution ?

@gedw99 Well, the solution for me is to just use AWS S3. 😊

@markuswustenberg This wasn't originally in scope for LiteFS but we've had so many people try to build object storage into SQLite on LiteFS that it seemed useful to better support it. I think S3 is great and it is probably the right solution for a lot of people. However, if you don't have a huge number of objects, storing and serving locally is a lot easier to setup—especially for someone not familiar with the AWS ecosystem.

How far can we take this? At what point could we say that LiteFS can handle "a huge number of objects" or is that outside of scope for the future?

The biggest limitation probably isn’t technical but cost. Having all your objects stored on all your nodes is going to cost real money at a certain scale. I’m not sure what that scale is exactly. S3 has pretty cheap storage costs but their bandwidth costs are high. Do you have a ballpark idea of how many objects you’re thinking?

I'm just trying to understand the trade-offs. In the context of the Epic Stack, if folks can start their new app idea out building on top of this instead of signing up for thirty different services and stay running like that long enough to prove out their idea then that's a real win.

That's the approach I'm taking with it too. As a back of the envelope calculation, let's assume the objects are images that average 1MB each (which seems high). That's 1GB per 1,000 objects. If they're replicating out to 3 nodes and they're paying $0.15/GB/mo for volumes then that's $0.45 per 1,000 images per month plus any bandwidth to replicate between nodes.

I think that once someone gets to 10,000 or 100,000 objects then they can begin to worry about costs and think about moving their objects to S3.

Yeah, so the path from LiteFS to S3 should be well paved (unless Fly introduces a more formal offering in the future ‼️), but starting with LiteFS is the simplest by a long shot. And actually has some nice benefits of globally distributed files as well (which AFAIK s3 does not have).

It's very interesting to read your perspectives. From mine, the operational and cognitive load on "just" using S3 is much smaller than running it on top of the storage layer, but I can see why you could think differently.

Regarding cost, S3 storage is IMO really cheap especially with intelligent tiering turned on. Bandwidth is expensive, agreed. But at a larger scale, I would probably look into putting a CDN or something like Cloudflare's R2 in front anyway.

Looking forward to following this issue.

gedw99 commented

It's very interesting to read your perspectives. From mine, the operational and cognitive load on "just" using S3 is much smaller than running it on top of the storage layer, but I can see why you could think differently.

Regarding cost, S3 storage is IMO really cheap especially with intelligent tiering turned on. Bandwidth is expensive, agreed. But at a larger scale, I would probably look into putting a CDN or something like Cloudflare's R2 in front anyway.

Looking forward to following this issue.

If you go with Cloudflare, baclblaze offer an AWS S3 compatible API with free transfers in and out of Cloudflare. It’s about 4 times cheaper than being on AWS S3.

It’s also faster than AWS from what I read.

It would be cool if fly.io joined the bandwidth alliance . Then you could run the db ( litefs) on fly with backblaze for storage.

Also more and more are getting into arrow / flight sql as a db backed by S3 also. Again if gmt.io did the bandwidth alliance thing it would be great ..

I run arrow and flight sql on fly.io now. With litefs it’s a great match because you can CDC you data into Arrow. Am not doing that yet.

Seafowl is one of the many arrow based systems:
https://seafowl.io/docs/getting-started/tutorial-fly-io/part-2-deploying-to-fly-io

https://www.cloudflare.com/bandwidth-alliance/