ipfs/go-ds-s3

Sharding?

Closed this issue ยท 5 comments

By default, go-ipfs provides a sharding option for the datastore. When using this plugin the datastore is not being sharded.

As described in previous issues, the serialization in the datastore_spec is not 1:1 because when I try to add shardFunc this results in an error.

Is there a way to achieve sharding for the data stored in S3?

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Specifically, the flatfs (flat-file backed datastore) provides a sharding option because some filesystems don't handle large directories very well. None of the other datastores provide such an option.

Honestly, sharding just doesn't make sense in S3 and would massively complicate the query logic.

@Stebalien Would be great to know why you think so, as the js-ipfs plugin for s3 does support sharding and in our case it has been useful to prevent rate limiting from s3.

We really don't see it as possible to use this plugin as it is without sharding.
cc @zachferland to provide any additional thoughts.

Also @Stebalien please see this discussion about why sharding is useful in s3 in general and more specifically why it is useful for IPFS s3 datastore: ipfs/js-datastore-s3#27

Interesting, I stand corrected.

In javascript, this isn't actually a feature in the s3 datastore but in a "wrapper" datastore that transforms keys. That's probably the correct way to implement this and that implementation would live in https://github.com/ipfs/go-datastore/.

However, it's going to be non-trivial to correctly handle queries, offsets, etc. Basically, every query would need to iterate over all shards at the same time, interleaving the results.

If you want to submit a datastore to do this, take a look at how queries are handled in https://github.com/ipfs/go-datastore/blob/ed11f242ef104130b10a1e86728ab3779cd23c64/mount/mount.go#L209.