SimpleLab-Inc/wsb

run wrapper for all steps

Closed this issue · 1 comments

We currently have wrappers for download (src/run_downloaders.sh) and transform (src/run_transformers.sh) steps (think of this as Tier 1), and can use expanded bash processes that bring the rest of the pipeline (Tiers 2 and 3) together.

To run Tier 2, we need @ryanshepherd's matching code to generate the "matched_output.csv". I think all that needs to run is python superjoin.py but there may be additional steps. @ryanshepherd - can you advise? Also, when the matching algorithm is ready to go, we should move a copy to src/match, archive the version in the little sandbox to prevent confusion, and run it from there.

Next, for Tier 3, we run everything in src/model (01_preprocess.R and 02_linear.R). These generate model predictions for the final step.

Finally, we run src/combine_tiers.R which uses explicit boundaries, matched_output.csv, TIGER polygons, and the model output from Tier 3 to write a single spatial national water service boundary layer.

In summary:

run_downloaders.sh -> run_transformers.sh -> python superjoin.py + anything else @ryanshepherd indicates -> everything in order in src/model (wrap into bash) -> combine_tiers.R

  • run_downloaders.sh
  • run_transformers.sh
  • run_match.sh (unless this is just simply python superjopin.py
  • run_model.sh (I just added this on the prelim-model branch)
  • run.sh (calls everything above, then Rscript -e "source('src/combine_tiers.R');"

Then we can have one bash process to rule them all which calls all of these in succession, e.g., src/run.sh

@noorkb - is this something you can take on?

FYI this is all Rich's writing (hello! 👋 ), I'm just editing the blank issue Jess created.

Made a PR for the run.sh script