pola-rs/r-polars

Polars Expression plugins for R

eitsupi opened this issue · 5 comments

We needs:

  1. Mechanism for registering subnamespaces from outside the package something like https://docs.pola.rs/py-polars/html/reference/api.html
  2. Rust crate something like https://github.com/pola-rs/pyo3-polars

Note: Serialization and deserialization of R objects that may be needed are already defined here (I don't know if this is sufficient)

pub fn serialize_robj(robj: Robj) -> RResult<Vec<u8>> {
call!("serialize", &robj, NULL)
.map_err(RPolarsErr::from)
.bad_robj(&robj)
.when("serializing an R object")?
.as_raw_slice()
.ok_or(RPolarsErr::new())
.bad_robj(&robj)
.when("accessing raw bytes of an serialized R object")
.map(|bits| bits.to_vec())
}
pub fn deserialize_robj(bits: Vec<u8>) -> RResult<Robj> {
call!("unserialize", &bits)
.map_err(RPolarsErr::from)
.bad_val(rdbg(bits))
.when("deserializing an R object")
}
pub fn serialize_dataframe(dataframe: &mut polars::prelude::DataFrame) -> RResult<Vec<u8>> {
use polars::io::SerWriter;
let mut dump = Vec::new();
polars::io::ipc::IpcWriter::new(&mut dump)
.finish(dataframe)
.map_err(polars_to_rpolars_err)?;
Ok(dump)
}
pub fn deserialize_dataframe(bits: &[u8]) -> RResult<polars::prelude::DataFrame> {
use polars::io::SerReader;
polars::io::ipc::IpcReader::new(std::io::Cursor::new(bits))
.finish()
.map_err(polars_to_rpolars_err)
}
pub fn serialize_series(series: PSeries) -> RResult<Vec<u8>> {
serialize_dataframe(&mut std::iter::once(series).collect())
}
pub fn deserialize_series(bits: &[u8]) -> RResult<PSeries> {
let tn = std::any::type_name::<PSeries>();
deserialize_dataframe(bits)?
.get_columns()
.split_first()
.ok_or(RPolarsErr::new())
.mistyped(tn)
.and_then(|(s, r)| {
r.is_empty()
.then_some(s.clone())
.ok_or(RPolarsErr::new())
.mistyped(tn)
})
}

  1. Mechanism for registering subnamespaces from outside the package something like docs.pola.rs/py-polars/html/reference/api.html

I was able to make this work in an implementation that I am rewriting from scratch using py-polars as a reference.
https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/man/polars_api_register_series_namespace.Rd#L20-L44

# s: polars series
math_shortcuts <- function(s) {
  # Create a new environment to store the methods
  self <- new.env(parent = emptyenv())

  # Store the series
  self$`_s` <- s

  # Add methods
  self$square <- function() self$`_s` * self$`_s`
  self$cube <- function() self$`_s` * self$`_s` * self$`_s`

  # Set the class
  class(self) <- "polars_namespace_series"

  # Return the environment
  self
}

polars_api_register_series_namespace("math", math_shortcuts)

s <- as_polars_series(c(1.5, 31, 42, 64.5))
s$math$square()$rename("s^2")

s <- as_polars_series(1:5)
s$math$cube()$rename("s^3")

The current concern is performance degradation due to frequent for loops (basically each call to a single method).
I believe the current implementation of r-polars registers all active bindings and methods when the package is installed, but it registers methods each time an R class instance is built, which would degrade performance (Of course, if it's acceptable, no problem)
https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/R/series-series.R#L7-L31

I have looked into this and it appears that this is accomplished by connecting to a dynamic library via the libloading crate.
https://docs.rs/libloading/latest/libloading/
https://github.com/pola-rs/polars/blob/5cad69e5d4af47e75ae0abbf88dc2bafbc8f66d2/crates/polars-plan/src/dsl/function_expr/plugin.rs#L5

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc.
Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc. Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

The recent libr might be of use here: https://github.com/posit-dev/ark/tree/main/crates#readme

My understanding is that dynamic libraries are built by R, so it doesn't matter which Rust crate is chosen to build the static library.
The question here is that I don't know how to make a proper C ABI for the dynamic library created by R.