`CppMethod` error when applying prepped UMAP recipe after saving/reading as `.rds`
Closed this issue ยท 7 comments
Seems like there is a bug ๐ for step_umap()
when trying to save a prepped recipe as .rds
and reading it back to apply it new data.
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(tidyverse)
library(embed)
split <- seq.int(1, 150, by = 9)
tr <- iris[-split, ]
te <- iris[ split, ]
set.seed(11)
supervised <-
recipe(Species ~ ., data = tr) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors()) %>%
step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
prep(training = tr)
write_rds(supervised, here::here(tempdir(), "umap.rds"))
saved_rec <- read_rds(here::here(tempdir(), "umap.rds"))
saved_rec %>% bake(new_data = te)
#> Error in .External(structure(list(name = "CppMethod__invoke_notvoid", : NULL value passed as symbol address
Created on 2021-08-02 by the reprex package (v2.0.0)
I'm sure this is not us (i.e. not the embed package) but I wonder if there is anything we can do about this.
The recipe is fine if you don't save as .rds
and then read it back.
I am very late to discovering this, but yes this is almost certainly because of the underlying UMAP package (uwot), which uses RcppAnnoy, which itself wraps the C++ library Annoy to find approximate nearest neighbors. The RcppAnnoy
objects have save
and load
methods that must be called and just using saveRDS
with them won't work (at least I couldn't get it to work). In turn uwot
needs to provide special functions to save and load its state but it's all very unsatisfactory. Sorry about that. I was unable to think of a workaround.
I do intend to fix this but my current solution involves writing an entirely new approximate nearest neighbors package. As that and maintaining uwot
exists entirely as a spare time endeavor, it's taking quite a long time (3 years and counting for the nearest neighbor package). I'll get there in the end. Probably.
Thanks for the message @jlmelville and for your work on uwot! ๐ We also are thinking about serialization for trained model objects like xgboost, torch, etc, that have native methods for saving/loading. Definitely an area that needs some attention from all of us!
This has now been solved with the new bundle package:
library(tidymodels)
library(tidyverse)
library(embed)
split <- seq.int(1, 150, by = 9)
tr <- iris[-split, ]
te <- iris[ split, ]
set.seed(11)
supervised <-
recipe(Species ~ ., data = tr) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors()) %>%
step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
prep(training = tr)
library(bundle)
temp_file <- fs::file_temp(pattern = "umap", ext = "rds")
bundle(supervised) %>% write_rds(temp_file)
saved_rec <- read_rds(temp_file)
unbundle(saved_rec) %>% bake(new_data = te)
#> # A tibble: 17 ร 3
#> Species UMAP1 UMAP2
#> <fct> <dbl> <dbl>
#> 1 setosa 13.3 2.93
#> 2 setosa 12.0 4.69
#> 3 setosa 14.5 3.12
#> 4 setosa 13.5 3.07
#> 5 setosa 13.4 2.99
#> 6 setosa 12.0 4.86
#> 7 versicolor -10.1 8.80
#> 8 versicolor -9.79 8.28
#> 9 versicolor -4.91 -11.6
#> 10 versicolor -9.66 6.12
#> 11 versicolor -10.1 6.61
#> 12 versicolor -10.3 6.98
#> 13 virginica -4.14 -11.6
#> 14 virginica -2.69 -12.1
#> 15 virginica -4.06 -10.3
#> 16 virginica -1.73 -11.5
#> 17 virginica -2.33 -10.9
Created on 2022-09-16 with reprex v2.0.2
We should document somewhere that this step needs to be bundled for use in a new session. How do you all want to do that?
Looks like I need to get in on this bundle thing...
I think we should document it as a section. Like we do with Tidying
and Case weights
, this way it will be easier to link to the documentation when the question pops up
Agreed. We just did this for the parsnip engine docs.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.