Graceful Openers for Connection Timeouts?
ranchodeluxe opened this issue · 1 comments
Problem
I have .nc4
inputs on s3
and I've noticed when trying to ETL archives >= ~5k range (the whole archive is about ~225k) that the function open_with_kerchunk
might bubble any number of sublassed botocore.exceptions.ConnectionError
exceptions (such as connection timeouts) that are expected during a lot of network interaction. It seems like open_with_xarray
might be prone to the same issue.
Failures like these for single inputs have the consequence of failing the whole pipeline
Possible Solution
Is there some way to catch these connection errors across fsspec
targets, log the problem inputs and make sure downstream transforms gracefully continue processing?
Closing this for now b/c skipping doesn't seem to work since downstream workflows produce index errors assuming those skipped timesteps need to be there. Next ideas:
- The s3 buckets I'm using have auth gateways. They might possibly be the problem
- See how checkpoints in beam/flink work
- If all else fails we might need to catch errors and create bunk records that we can then reprocess