r-rudra/tidycells

Rcpp Adoption in various stages

Opened this issue · 8 comments

Need to adopt {Rcpp} whenever possible.

  • in LCS
  • in as_cell_df
  • in direction attachment (enhead alternative)

LCS implemented in Rcpp

I noticed that Rcpp may not be optimal for large data in as_cell_df

Maybe I need to check memory status by memory.limit()

expr min lq mean median uq max neval
unpivotr::as_cells(df) %>% tidycells::as_cell_df() 634.28278 683.8018 778.8232 781.1177 881.1816 910.1206 10
tidycells::as_cell_df(df, take_col = F) %>% tidycells::as_cell_df() 809.61787 997.8078 993.7125 1007.7760 1031.1863 1057.0500 10
as_cell_df_c2(df) %>% tidycells::as_cell_df() 99.25656 104.2768 133.2668 113.7746 120.1005 331.1366 10
as_cell_df_r(df) %>% tidycells::as_cell_df() 117.50674 123.1997 133.3697 128.3014 136.8042 179.8280 10
as_cell_df_r2(df) %>% tidycells::as_cell_df() 107.92256 109.7794 116.9087 117.5692 121.1555 130.4172 10

as_cell_df_r2 may be a good option to speedup as_cell_df

enhead alternative is not in Rcpp

Check out tidycells_nightly@Rcpp-dep

While LCS is great, is_attachable is poor as compared to R. Maybe my recent knowledge of C++ is not adequate enough to implement the performant version. Also in my opinion heuristic of this level is better to be kept with R.

As of now, there is no way to save Rcpp::cppFunction(). The only option is to create a package.

A package directly can't have an optional dependency on {Rcpp}. [It has to be in Imports at least behaviorally]

Hence the best idea is to remove the dependency. LCS is required in name_suggest which is a small and experimental portion of the package.

Implement #36

source code of LCS can be ported as optional module