Cut down working and output directories
Opened this issue · 0 comments
willbradshaw commented
Currently the biggest cost of running the pipeline is not the compute cost of execution but the cost of storing all the intermediate files (e.g. in S3). Review the biggest space demands in the working and publish directories and try to reduce them as much as possible (e.g. by skipping or combining space-hungry steps).