spotify/scio

Allow largeHash* and sparkey methods to set a byte size target

kellen opened this issue · 0 comments

Estimate the size of input collections and allow users to configure (rough) numBytes rather than numShards.

I propose dropping numShards completely. Also propose dropping special handling of "unsharded" sparkey and updating sparkey reads to infer from filenames directly