AlexIoannides/pipeliner

Example from R-bloggers not reproducible

MarkusBonsch opened this issue · 3 comments

Hi there,

many thanks for your work and your blog-post on R-Bloggers (https://www.r-bloggers.com/machine-learning-pipelines-for-r/).

However, I find it impossible to reproduce the example. Can you provide a reproducible example, how to use pipeliner with modelr for cross-validation? Particularly the following parts are producing errors that the dataset is missing in the call to pipeline and that cv_rmse is not defined. Additionally, I think that the pipe operator is incorrect?:

library(tidyverse)
lm_pipeline %
   pipeline(
transform_features(function(df) {
  transmute(df, x1 = (waiting - mean(waiting)) / sd(waiting))
}), ...

cv_rmse %
mutate(model = map(train, ~ pipeline_func(as.data.frame(.x))),
     predictions = map2(model, test, ~ predict(.x, as.data.frame(.y))),
     residuals = map2(predictions, test, ~ .x - as.data.frame(.y)$eruptions),
     rmse = map_dbl(residuals, ~ sqrt(mean(.x ^ 2)))) %>%
 summarise(mean_rmse = mean(rmse), sd_rmse = sd(rmse))

See my session info below.
Thank's for your help.

Markus

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] modelr_0.1.0         dplyr_0.5.0          purrr_0.2.2          readr_1.1.0         
 [5] tidyr_0.6.1          tibble_1.2           ggplot2_2.2.1        tidyverse_1.1.1     
 [9] pipeliner_0.1.1.900  RevoUtilsMath_10.0.0 RevoUtils_10.0.2     RevoMods_10.0.0     
[13] MicrosoftML_1.0.0    mrsdeploy_1.0        RevoScaleR_9.0.1     lattice_0.20-34     
[17] rpart_4.1-10        

loaded via a namespace (and not attached):
 [1] reshape2_1.4.2         haven_1.0.0            colorspace_1.3-2       CompatibilityAPI_1.1.0
 [5] foreign_0.8-67         withr_1.0.2            DBI_0.6                readxl_0.1.1          
 [9] foreach_1.4.3          plyr_1.8.4             stringr_1.2.0          munsell_0.4.3         
[13] gtable_0.2.0           rvest_0.3.2            devtools_1.12.0        codetools_0.2-15      
[17] psych_1.6.9            memoise_1.0.0          knitr_1.15.1           forcats_0.2.0         
[21] mrupdate_1.0.0         parallel_3.3.2         curl_2.2               broom_0.4.1           
[25] Rcpp_0.12.10           scales_0.4.1           jsonlite_1.1           mnormt_1.5-5          
[29] hms_0.3                packrat_0.4.8-1        digest_0.6.12          stringi_1.1.3         
[33] grid_3.3.2             tools_3.3.2            magrittr_1.5           lazyeval_0.2.0        
[37] xml2_1.1.1             lubridate_1.6.0        assertthat_0.1         nxPacMan_1.2.1        
[41] httr_1.2.1             iterators_1.0.8        R6_2.2.0               nlme_3.1-128          
[45] git2r_0.18.0 

Hi Markus,

Thanks for getting in touch. I'm on holiday at the moment with no computer for then the next 10 days, so please bare with me.

Have you tried the example directly from this GitHub repo's README? Looking at what you've pasted above it doesn't match exactly and I'm wondering if some of the formatting has been lost in the transition to R-Bloggers, etc.

Thanks,

Alex

It turns out that '<' followed by a '>' in Wordpress's code block formatter comments-out everything in between, so a '<-' followed by a '%>%' results in code inbetween vanishing.

This has now been fixed on the site.

Thank's for clarifying and fixing.