naobservatory/mgs-workflow

Cut down working and output directories

Opened this issue · 0 comments

Currently the biggest cost of running the pipeline is not the compute cost of execution but the cost of storing all the intermediate files (e.g. in S3). Review the biggest space demands in the working and publish directories and try to reduce them as much as possible (e.g. by skipping or combining space-hungry steps).