Why does the user need to provide the types for each column?
Closed this issue · 2 comments
It seems that based on the example in the README using mtcars, the software should be easily able to provide "guesses" of the type for each variable, and spit them back to the user so that they see them, and can re-run with a specific list if it's not right.
Alternatively, there could be a helper function to generate the list of column types that is run first, modified if needed, and then passed to the estR (latentcor) function.
Thank you for suggestion. We created a new function get_types
that automatically determines the type of each variable, and returns a vector of types
compatible with the expected input to latentcor.
In addition, the default for types
parameter in latentcor
has been changed to NULL, so if the types are not supplied, latentcor
automatically runs get_types
first. However, we do recommend that users supply the types explicitly if they are known in advance as automatic determination via get_types
increases computational costs. For mtcars, it's not a huge increase as the dataset is small (32 samples, 11 variables)
library(microbenchmark)
microbenchmark(get_types(mtcars))
# median 497 microseconds on Mac OS with 3.1 GHz Dual-Core Intel Core i7
However, when number of variables is large, the increase is more substantial
X = matrix(rnorm(500 * 1000), 500, 1000)
microbenchmark(get_types(X))
# median 43 milliseconds on Mac OS with 3.1 GHz Dual-Core Intel Core i7
and will be even more substantial if latentcor is run as part of sub-sampling or bootstrapping routines without specifying the types explicitly. We reflect this recommendation on types specification in the latentcor
documentation for types
and in the updated vignette showing application of get_types
to mtcars dataset
Exactly what I was imagining. Looks awesome.