run wrapper for all steps
Closed this issue · 1 comments
We currently have wrappers for download (src/run_downloaders.sh
) and transform (src/run_transformers.sh
) steps (think of this as Tier 1), and can use expanded bash processes that bring the rest of the pipeline (Tiers 2 and 3) together.
To run Tier 2, we need @ryanshepherd's matching code to generate the "matched_output.csv". I think all that needs to run is python superjoin.py
but there may be additional steps. @ryanshepherd - can you advise? Also, when the matching algorithm is ready to go, we should move a copy to src/match
, archive the version in the little sandbox to prevent confusion, and run it from there.
Next, for Tier 3, we run everything in src/model
(01_preprocess.R
and 02_linear.R
). These generate model predictions for the final step.
Finally, we run src/combine_tiers.R
which uses explicit boundaries, matched_output.csv
, TIGER polygons, and the model output from Tier 3 to write a single spatial national water service boundary layer.
In summary:
run_downloaders.sh
-> run_transformers.sh
-> python superjoin.py
+ anything else @ryanshepherd indicates -> everything in order in src/model
(wrap into bash) -> combine_tiers.R
- run_downloaders.sh
- run_transformers.sh
- run_match.sh (unless this is just simply
python superjopin.py
- run_model.sh (I just added this on the prelim-model branch)
- run.sh (calls everything above, then
Rscript -e "source('src/combine_tiers.R');"
Then we can have one bash process to rule them all which calls all of these in succession, e.g., src/run.sh
@noorkb - is this something you can take on?
FYI this is all Rich's writing (hello! 👋 ), I'm just editing the blank issue Jess created.