Object Storage

Question

Object Storage

benbjohnson opened this issue a year ago · 20 comments

Many applications need to store and propagate files that are not SQLite databases (e.g. images). While it's possible to store large binary data in SQLite, it is not efficient. A better approach would be to support non-SQLite files on their own.

Files can be bundled using the same LTX format, however, they will contain a single page of all the data in the file. Files must be saved in their entirety and atomically. For files written via FUSE, this corresponds to when a file handle is closed. We may also want to provide a safer atomic HTTP API as the FUSE approach could still have half-written files if a process dies while writing.

Answer 1 · 2023-05-26T06:00:26.000Z

still have half-written files if a process dies while writing.

Very good issue I hadn't considered. Do you suppose it would be possible to detect this situation and auto-delete the half-written file?

Answer 2 · 2023-05-26T12:53:46.000Z

We could require fsync() to finalize the file. That would be a good way to detect a fully written file and then we could discard partial files on close.

Answer 3 · 2023-06-15T12:53:11.000Z

I'm surprised this is within the scope of LiteFS. I'm trying to understand the use case. Why would you use LiteFS for this, and not use something like S3 directly?

Answer 4 · 2023-06-15T13:01:56.000Z

I don't so much care what tech is behind object storage (provided it works well). I'm more interested in limiting the number of service providers I'm using. So if fly says they can host files for me then I'm in.

Answer 5 · 2023-06-15T13:08:17.000Z

this is a great idea..

Ue cases :

I need to store and process images, and i need that URI or CID to that image to be stored in s3 / Minio so i can relate the image to some data in sqlite.

Running minio and litefs together is a great combo.

no lockin.
can run on fly plus others.
BUT your doing the Ops yourself, but thats not too hard with minio. It now does master master replication between minio servers and failed drives are auto removed until a new one is added and then it self restores. It's pretty much hands off these days. https://blog.min.io/minio-replication-best-practices/

You can mount volumes and map that to minio. You need a minimum of 4 drives for proper redundancy with minio, ad so thats 4 volumes.
I do this on Hetzner as its cheap and they have 4 data centers spreads globally now.

On fly i think you can also mount volumes with the new machines system ?
Can you resize a volume on Fly without disconnecting it ? will that be Ok with minio ?

https://docs.hetzner.com/cloud/volumes/faq

can resize up only
they back each volume with triple redundancy. so its like a SAN. So then you really dont need more than 1 minio volume, when its backed by 3 underlying HD's ?
need to manually go in and extend the file system after a volume resize.
don't need to restart machine.
max: 10 TB
storage is 40 euro fr 1 TB ( triple backed ). https://pcr.cloud-mercato.com/providers/hetzner/volumes/network
You can go for dedicated server with drives too: https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/#price-list-for-additional-drives. so 1 TB x 4 = 40 euro / month

https://fly.io/docs/app-guides/minio/

can resize up only: https://fly.io/docs/reference/volumes/#extend-a-volume
there is no underlying SAN replication, so need to mount 4 volumes per machine running minio.
dont need to extend the file system after a volume resize.
But MUST restart fly machine...
max: ?
price for storage: 1 TB = 20 euro / month ( so 80 euro / month for minio) of provisioned capacity: https://fly.io/docs/about/pricing/
price for network inbound / outbound: 100GB per month free | $0.02 per GB
bandwith: free up to ?...

Answer 6 · 2023-06-15T14:26:26.000Z

@kentcdodds I can definitely understand that perspective. I certainly like simplifying my stack, which is what drove me to SQLite/LiteFS in the first place.

Mine is that I trust AWS S3 immensely with my data, perhaps more so than any other provider. So my optimal use case is LiteFS + backup to S3, and then just pure S3 for object storage. So for something like that, having a LiteFS layer in between seems redundant and just complicating matters. But I guess I'm missing the whole backstory here.

@gedw99 I get what you're saying, but I'm personally not interested in running something like Minio myself. Too much operational complexity for my taste. (It's easy until something goes wrong, IMO. 😉)

So I guess I'm interested in an elaboration of the core issue, and what advantages this would bring compared to using S3 (or something with an S3-like API) directly?

Answer 7 · 2023-06-15T14:29:04.000Z

@markuswustenberg Been running minio for ages. when you getting into high TB it saves you a lot of money.

if you not using high TB then its pointless, and you might as well just run on someone else's S3.

But fly does not have one so then what is the solution ?

Answer 8 · 2023-06-16T10:51:01.000Z

@gedw99 Well, the solution for me is to just use AWS S3. 😊

Answer 9 · 2023-06-19T15:47:32.000Z

@markuswustenberg This wasn't originally in scope for LiteFS but we've had so many people try to build object storage into SQLite on LiteFS that it seemed useful to better support it. I think S3 is great and it is probably the right solution for a lot of people. However, if you don't have a huge number of objects, storing and serving locally is a lot easier to setup—especially for someone not familiar with the AWS ecosystem.

Answer 10 · 2023-06-19T16:01:24.000Z

How far can we take this? At what point could we say that LiteFS can handle "a huge number of objects" or is that outside of scope for the future?

Answer 11 · 2023-06-19T16:59:10.000Z

The biggest limitation probably isn’t technical but cost. Having all your objects stored on all your nodes is going to cost real money at a certain scale. I’m not sure what that scale is exactly. S3 has pretty cheap storage costs but their bandwidth costs are high. Do you have a ballpark idea of how many objects you’re thinking?

Answer 12 · 2023-06-19T18:48:33.000Z

I'm just trying to understand the trade-offs. In the context of the Epic Stack, if folks can start their new app idea out building on top of this instead of signing up for thirty different services and stay running like that long enough to prove out their idea then that's a real win.

Answer 13 · 2023-06-19T18:58:55.000Z

That's the approach I'm taking with it too. As a back of the envelope calculation, let's assume the objects are images that average 1MB each (which seems high). That's 1GB per 1,000 objects. If they're replicating out to 3 nodes and they're paying $0.15/GB/mo for volumes then that's $0.45 per 1,000 images per month plus any bandwidth to replicate between nodes.

I think that once someone gets to 10,000 or 100,000 objects then they can begin to worry about costs and think about moving their objects to S3.

Answer 14 · 2023-06-19T19:36:34.000Z

Yeah, so the path from LiteFS to S3 should be well paved (unless Fly introduces a more formal offering in the future ‼️), but starting with LiteFS is the simplest by a long shot. And actually has some nice benefits of globally distributed files as well (which AFAIK s3 does not have).

Answer 15 · 2023-06-20T08:00:16.000Z

It's very interesting to read your perspectives. From mine, the operational and cognitive load on "just" using S3 is much smaller than running it on top of the storage layer, but I can see why you could think differently.

Regarding cost, S3 storage is IMO really cheap especially with intelligent tiering turned on. Bandwidth is expensive, agreed. But at a larger scale, I would probably look into putting a CDN or something like Cloudflare's R2 in front anyway.

Looking forward to following this issue.

Answer 16 · 2023-06-20T08:09:57.000Z

It's very interesting to read your perspectives. From mine, the operational and cognitive load on "just" using S3 is much smaller than running it on top of the storage layer, but I can see why you could think differently.

Regarding cost, S3 storage is IMO really cheap especially with intelligent tiering turned on. Bandwidth is expensive, agreed. But at a larger scale, I would probably look into putting a CDN or something like Cloudflare's R2 in front anyway.

Looking forward to following this issue.

If you go with Cloudflare, baclblaze offer an AWS S3 compatible API with free transfers in and out of Cloudflare. It’s about 4 times cheaper than being on AWS S3.

It’s also faster than AWS from what I read.

It would be cool if fly.io joined the bandwidth alliance . Then you could run the db ( litefs) on fly with backblaze for storage.

Also more and more are getting into arrow / flight sql as a db backed by S3 also. Again if gmt.io did the bandwidth alliance thing it would be great ..

I run arrow and flight sql on fly.io now. With litefs it’s a great match because you can CDC you data into Arrow. Am not doing that yet.

Seafowl is one of the many arrow based systems:
https://seafowl.io/docs/getting-started/tutorial-fly-io/part-2-deploying-to-fly-io

https://www.cloudflare.com/bandwidth-alliance/