Consider optionally moving dedup and shuffle to a second step
rom1504 opened this issue · 2 comments
rom1504 commented
The mapping if done alone can be done using only s3, CPU and network resources.
Very little ram and disk
Although if working perfectly it makes sense to do all in one stage, it might be good to provide the multi steps option for reliability concerns
rom1504 commented
actually not really needed thanks to dedup being fast for smaller parts