spotify/scio

Improve error message when sparkey hits array-size limits

kellen opened this issue · 0 comments

When using saveAsSparkey, if any shard is > ~2gb then you will get a coder exception and something like

Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Required array length 2147483639 + 15534 is too large

which is not easily interpretable.

See if we can preemptively capture serialized sizes so that we can issue a better error message like "Increase number of sparkey shards"