f-ing drake
kaizadp opened this issue · 10 comments
drake doesn't like functions?
I get this error when I run the drake_plan. It stops at the fit_hsd_xx
functions. But it works if I run the functions before running the plan.
▶ target fticr_hsd_totalpeaks
x fail fticr_hsd_totalpeaks
Error: target fticr_hsd_totalpeaks failed.
diagnose(fticr_hsd_totalpeaks)$error$message:
could not find function "fit_hsd_totalpeaks"
diagnose(fticr_hsd_totalpeaks)$error$calls:
peakcounts_core %>% filter(class == "total") %>% mutate(Suction = as.character(Suction)) %>%
group_by(Suction, Homogenization, Moisture) %>% do(fit_hsd_totalpeaks(.))
base::withVisible(eval(quote(_fseq
(_lhs
)), env, env))
base::eval(quote(_fseq
(_lhs
)), env, env)
base::eval(quote(_fseq
(_lhs
)), env, env)
_fseq
(_lhs
)
magrittr::freduce(value,_function_list
)
base::withVisible(function_list[k])
function_list[k]
dplyr::do(., fit_hsd_totalpeaks(.))
dplyr:::do.grouped_df(., fit_hsd_totalpeaks(.))
rlang::eval_tidy(args[[j]], mask)
Hey @kaizadp . No, drake
likes functions fine. 😕 Should I clone this and try to run?
Hey @kaizadp . No,
drake
likes functions fine. 😕 Should I clone this and try to run?
Yes, please!
Hey, so I see the immediate problem, but also a broader, more fundamental one.
The immediate problem is that, yes, you had <-
calls in your pipeline, which for very good but arcane technical reasons drake
doesn't like. And so no, drake
doesn't 'like' functions in this way, so my answer above was misleading.
But, and here we get to the bigger issue, that's because you're misusing the concept of a pipeline. They're not intended or designed to have code dumped in them en masse like this. It quickly gets unreadable, if, say, you have an 800+ line pipeline 😄 Instead, drake
pipelines are intended to be the high-level (or at worst medium-level) 'build steps' for your analysis.
For example, a huge amount of the current pipeline code is devoted to making plots. This shouldn't be here; it's confusing and distracts from one of the fundamental purposes of the pipeline, which is provide an easy-to-see list of the steps and dependencies in your analysis. Take
gg_fticr_domains =
data_long_trt %>%
distinct(formula) %>%
left_join(dplyr::select(meta, formula, class, HC, OC), by = "formula") %>%
gg_vankrev(aes(x = OC, y = HC, color = class))+
scale_color_manual(values = PNWColors::pnw_palette("Sailboat"))+
theme_kp()+
theme(legend.position = "right")+
NULL,
I'd strongly suggest putting the code to generate this plot elsewhere, and then just
gg_fticr_domains = make_ gg_fticr_domains_plot(data_long_trt),
This makes it super clear what's happening, and what the dependency is. Again, you don't want to have to see the plot-generating code in your pipeline. It's too much detail.
Note: the above assumes that gg_fticr_domains
is used, further down in the pipeline, as a dependency for something else. If it's not, don't create a target for it! In that case just replace a whole group of plot-generating steps with something like
gticr_replication_plots = make_gticr_replication_plots(...), # plots are saved to disk by function
I'm opening a PR with some basic changes that enables the pipeline to run...at least until it hits
target peakcounts_trt
fail peakcounts_trt
Error: Target `peakcounts_trt` failed. Call `diagnose(peakcounts_trt)` for details. Error message:
Join columns must be present in data.
✖ Problem with `SampleAssignment`.
Hope this is useful. I'm not trying to criticize, just to help you move in right direction. In my experience drake
really is super cool for this kind of thing, and I feel terrible you've had such a crappy experience with it.
That definitely helps, and we may need to have a detailed conversation on drake
and best practices.
I created this drake
pipeline to feed into rmarkdown. My fticr markdowns are usually really big, and I was trying to make them more efficient -- so here, I was plotting all the graphs in drake
, and just calling the relevant graphs/tables in the rmarkdown report (see the "report1" target).
OK if I work on and open a PR working on your pipeline?
Yes, of course
damn, that was a lot of edits. Thanks!
I understand what you did here, but it kind of seems a round-about way to plot graphs -- functions -> drake plan -> feed into Markdown.
We should discuss good workflow design.
From my point of view, the markdown document is target (probably one of the last ones) in the pipeline; it depends on everything that's come before. But if you're just producing a single markdown, yes, could just do all the computations and analysis in the document itself–in fact, that what I would do. Simpler and cleaner. The drake/target approach starts to get really useful if you have multiple targets (e.g. documents) that depend on an overlapping set of earlier targets.
Yes, let's discuss I think.