Switch to use user filesystem (or alternative) instead of KFS, better accounting of usage

Question

Switch to use user filesystem (or alternative) instead of KFS, better accounting of usage

Closed this issue 6 years ago · 11 comments

Use filesystem (or alternative) for storing shards in a directory named by hash. Continue to use leveldb for contracts, however there can be indexes added so that it's possible to search by:

Date that a shard will expire, for more efficient expiring of shards
Keep track of bytes stored for faster check of stored data usage

Pros:

Total allocated space isn't divided by 256 (per leveldb "bucket"). This will resolve issues with max shard size being the max shard size is 1/256 of the allocated space.
Removes issues with compacting leveldb databases and shards being written to disk several times within leveldb as the data moves to different levels.
Resolve any memory issues that may be lingering with KFS
Read/write will match disk speeds as leveldb is usually half the speed of the disk

Cons:

FAT32 filesystem would have maximum number of files

See discussion at https://github.com/Storj/dev-meetings/blob/master/2017-08-30-summary.md#shard-storage-in-storjshare

Answer 1 · 2017-09-12T13:34:37.000Z

Yes please get rid of leveldb. Its already causing headaches: storj-archived/storjshare-daemon#247 . Using regular file system can allow sharing of extra mounted cloud storage in future better.

Answer 2 · 2017-10-10T11:07:33.000Z

@ne0ark If you mount cloud storage for publishing cloud storage something is completely wrong.
At least in my opinion. If its possible to get cloud space from anywhere cheaper, the question should be why is StorJ more expensive!
Or am i getting something wrong here?

Answer 3 · 2017-10-14T15:02:36.000Z

The con FAT32 filesystem would have maximum number of files is no real con, FAT32 is a filesystem which is very old and should never be used.

Answer 4 · 2017-11-01T15:28:57.000Z

Currently working on benchmarking alternative methods for storing data.

Requirements

Low memory footprint during concurrent read operations
Robustness (no data corruption)
Minimize compaction for reads and writes
Accurate tracking of stored bytes and bytes transferred
Handle high range of shard sizes without limitations on max shard size
Optimize for shards of multiple of 2MiB shards
Target low-end systems (32-bit and slower HDDs)

Answer 5 · 2017-11-01T15:52:26.000Z

Delete expired shards without reading them all

Answer 6 · 2017-11-02T02:08:40.000Z

Auto heal without waiting gazillion hours to repair. Since it will drop reputation.

Answer 7 · 2017-12-28T11:21:06.000Z

I agree that the fs should be used for the actual data, with a small dB used to handle the metadata. It seems there were scaling issues with using a single database for all the data (which is how things started) so the resolution was to use 256 databases instead, effectively multiplying that same problem by 256. This also implemented certain limits, such as a maximum shard size (1/ 256 of total space) and also an upper limit of 8tb for one node (each dB is limited in size so that you don't notice any problems such as locks when it's compacting). It's removed any possibility of future proofing, especially given the rate that HD capacity has increased in recent years.

All of this just seems crazy on every technical level. Clearly the dB is being used beyond its intended purpose.

Storj have done some truly brilliant things, but I don't feel the storage mechanism is one of them.

Answer 8 · 2017-12-30T04:56:58.000Z

It may be worth storing smaller data, less than 2MB or so, in the contracts database, or a similar database. And for larger shards store them with a filesystem using following scheme:

↳ <2-bytes> directory - first two bytes of shard hash used as the directory name
  ↳ <2-bytes> directory - next two bytes of the shard has used as the directory name
    ↳ <16-bytes> file - the last remaining bytes used for the filename

This should work well across many different filesystems, though we should do benchmarking to verify.

Answer 9 · 2017-12-30T09:29:34.000Z

I would very much hope these two do not get mixed.
Actually i dont see the problem for small files given the dir structure. If not avoidable please use a separate db for that. Many, including me, would like to keep contracts and actual data separate.

Answer 10 · 2017-12-30T15:57:43.000Z

Experiment 1 https://github.com/aleitner/libmapstore
Experiment 2 https://github.com/aleitner/libfilestore
Also experimenting with storing directly in sqlite

Answer 11 · 2018-10-30T09:51:28.000Z

👋 Hey! Thanks for this contribution. Apologies for the delay in responding!

We've decided to rearchitect Storj, so that we can scale better. You can read more about this decision here. This means that we are entirely focused on v3 at the moment, in the storj/storj repository. Our white paper for v3 is coming very, very soon - follow along on the blog and in our Rocketchat.

As this repository is part of the v2 network, we're no longer maintaining this repository. I am going to close this for now. If you have any questions, I encourage you to jump on Rocketchat and ask them there. Thanks!