Rewrite based on savvy and rlang
eitsupi opened this issue · 30 comments
(The following text is copied from https://github.com/eitsupi/neo-r-polars's README)
Motivation
I have been developing r-polars for over a year, and I felt that a significant rewrite was necessary.
r-polars is a clone of py-polars,
but the package structure is currently quite different.
Therefore, it was difficult to keep up with frequent updates.
I thought that now, around the release of Python Polars 1.0.0, is a good time for a complete rewrite, so I decided to try it.
There are several reasons to rewrite r-polars on both the Rust and R sides.
Rust side
- Appropriate file division. Due to the limitations of extendr,
it is not possible to place multiple impl blocks.
(extendr/extendr#538) - Error handling.
There is a lot of custom code to use the Result type with extendr, which is quite different from other packages based on extendr.
(extendr/extendr#650) - Simplify type conversion.
The code is difficult to follow because it uses a macro calledrobj_to
for type conversion (at least in rust-analyzer).
About 1 and 2, I expect that switching from extendr to savvy
will improve the situation.
For 3, in py-polars and nodejs-polars, a thin Wrap
struct wraps other types and processes them with standard From
traits etc.,
which I think makes the code cleaner.
R side
- The structure of classes.
In py-polars, the strategy is that classes defined on the Rust side (e.g.,PyDataFrame
) are wrapped by classes defined on the Python side (e.g.,DataFrame
).
In r-polars, a complex strategy is adopted to update classes created by Rust side/extendr (e.g.,RPolarsDataFrame
) with a lot of custom code.
(This is also related to the fact that extendr makes associated functions of Rust structs members of R classes. savvy does not mix associated functions and methods.) - S3 methods first.
This is also related to the Rust side, in the current r-polars, generic functions likeas_polars_series
were added later,
so there are several places where type conversion from R to Polars is done on the Rust side, making it difficult to understand where the type conversion is done.
If type conversion from R to Polars is done with two generic functions,as_polars_series
andas_polars_expr
, the code will be much simpler and customization from the R side will be possible. - Error handling.
Currently, r-polars has its own Result type on the R side, and error handling is done through it.
The backtrace generated that is quite easy to understand, but it is not necessarily easy to use when using polars internally in other packages, such astestthat::expect_error()
. - Based on
rlang
.
Currently, r-polars has no R package dependencies. This is great,
but that includes a degraded copy oflist2()
instead of the convenient functions in therlang
package.
rlang
is a lightweight R package, and I feel that it is more beneficial to depend on the convenient functions ofrlang
than to stick to no dependencies.
1 and 3 are also related to the fact that it is built with extendr, and it seems that switching to savvy is appropriate here as well.
If we abandon the current Result type on the R side, it is natural to use rlang
for error handling, so from that perspective, it is reasonable to depend on rlang
in 4.
Thanks for all the work, I'll take a look this weekend after #1147
Significant changes that may affect users are noted in the NEWS file.
https://github.com/eitsupi/neo-r-polars/blob/60502b13e9c21ef38c733abc24477671e0e13a2a/NEWS.md
I pushed the neo-r-polars' main branch to this repository so that we can start the migration work.
f7316d3
For now, the new next
branch lacks developer tools and other settings, and switching to this branch in the local repository of r-polars should detect a large number of diffs due to differences in .gitignore, so I recommend that do not switch to the branch.
Documentation for translation from Python Polars has been pushed.
I hope it comes across as fairly simple.
https://github.com/pola-rs/r-polars/blob/63790cd069cd736e650a5aabf7900f197490ece9/DEVELOPMENT.md
Thanks, that is very helpful. I noticed that the class names have changed, e.g from RPolarsExpr
to polars_expr
. Is there any reason for this? I think the current class names are already clear and unique enough to avoid collisions with classes from other packages.
Similarly, the current Struct
is called RPolarsExpr
but in the rewrite it is PlRExpr
. I like the fact that those names are identical on the R and Rust sides, is it possible to keep them like this?
Also, related to the post on Mastodon asking for help, I think it would be helpful to have a checklist (either at the top of this issue or in a new one) with more details on the progress of the rewrite. It doesn't have to be super detailed but simply have an idea of how far we are, where should a new contributor focus, etc. For instance something like this:
-
DataFrame
- translation
- tests
-
LazyFrame
- translation
- tests
-
Expr
-
str
- translation
- tests
-
arr
- ...
-
Thanks for taking a look at this. I will create the task list at another epic issue soon as I can, as it is definitely needed.
I noticed that the class names have changed, e.g from
RPolarsExpr
topolars_expr
. Is there any reason for this? I think the current class names are already clear and unique enough to avoid collisions with classes from other packages.
There are two reasons.
- To avoid conflicts between the current
polars
package and theneopolars
package. This is also why the names of the structs on the Rust side is something likePlRExpr
. This justification disappears when there is no longer a need for coexistence (i.e., when theneopolars
package is renamed topolars
). - Because the class name of CamelCase is not considered to be very common in R.
In other words, the current class names such asRPolarsExpr
are a compromise due to the restrictions of the extendr that Rust's struct and R's classes cannot be named separately, and the need to avoid class name conflicts with other packages. But if class names can be given only on the R side I feel that the snake_case is more appropriate than the CamelCase.
Of course, this is debatable.
Similarly, the current
Struct
is calledRPolarsExpr
but in the rewrite it isPlRExpr
. I like the fact that those names are identical on the R and Rust sides, is it possible to keep them like this?
No, these are actually different things. That is, polars_expr
and PlRExpr
already exist on the R side, and PlRExpr
is an internal class that is not normally touched by the user.
This is the same as Python Polars having two classes: the Expr
class and the PyExpr
class (from the Rust struct).
In Python:
>>> import polars as pl
>>> pl.lit(1).__class__.__name__
'Expr'
>>> pl.lit(1)._pyexpr.__class__.__name__
'PyExpr'
In R:
> pl$lit(1) |> class()
[1] "polars_expr" "polars_object"
> pl$lit(1)$`_rexpr` |> class()
[1] "PlRExpr"
Unlike the current RPolarsExpr
, PlRExpr
is an internal class, so I have tentatively named it PlRExpr
because we think it is safe even if it does not contain the word "polars" (although it is an internal class, it is not without risk of collision, so I have not made it RExpr
).
I see, thanks for the explanations.
There are two reasons.
- To avoid conflicts between the current polars package and the neopolars package. This is also why the names of the structs on the Rust side is something like PlRExpr. This justification disappears when there is no longer a need for coexistence (i.e., when the neopolars package is renamed to polars).
- Because the class name of CamelCase is not considered to be very common in R.
In other words, the current class names such as RPolarsExpr are a compromise due to the restrictions of the extendr that Rust's struct and R's classes cannot be named separately, and the need to avoid class name conflicts with other packages. But if class names can be given only on the R side I feel that the snake_case is more appropriate than the CamelCase.
Of course, this is debatable.
I don't think the style of class names is important. To me, the two main requirements is that they must be clear and unique, and I think RPolarsExpr
and variants match those requirements. We already break a lot of code at each release, I'd rather not break the class names in addition to that.
Bottom line, I'd rather keep the current RPolars*
names. Open to discuss too ofc.
Sidenote: I didn't find conventions for class names (in general, I feel R code lacks conventions anyway). I also didn't find packages with CamelCase class names (can't say I searched a lot though) but some prominent packages do not use snake_case:
class(Matrix::Matrix(0, 3, 2))
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
Packages using S4 (like those on BioConductor) also usually have camel case.
I didn't find conventions for class names (in general, I feel R code lacks conventions anyway).
I think the most commonly used style guides are the Tidyverse style guide and Google's Style guide (of course Google publishes Style guides for many languages and I believe they are very popular).
https://style.tidyverse.org/syntax.html#object-names
https://google.github.io/styleguide/Rguide.html
Both of these encourage the use of snake_case for class names.
Hi @eitsupi @etiennebacher ,
I am not convinced that switching from extendr
to savy
will solve the maintenance problems. First of I have taken a quick look at savvy
and neo-r-polars
. A rewire in of it's own is already a quite daunting task, and generating the bindings is the simple part. The hard part is doing integrating the code such that it feels "natural" in the R world. This is where most of the complexity of r-polars
comes from. Generating the bindings with savvy
is a lot more manual that doing so with extendr
. Let me explain:
// savvy taken from `neo-r-polars`
#[savvy]
#[derive(Clone)]
pub struct PlRDataFrame {
pub df: DataFrame,
}
impl From<DataFrame> for PlRDataFrame {
fn from(df: DataFrame) -> Self {
Self { df }
}
}
impl TryFrom<EnvironmentSexp> for &PlRDataFrame {
type Error = String;
fn try_from(env: EnvironmentSexp) -> Result<Self, String> {
let ptr = env
.get(".ptr")
.expect("Failed to get `.ptr` from the object")
.ok_or("The object is not a valid polars data frame")?;
<&PlRDataFrame>::try_from(ptr).map_err(|e| e.to_string())
}
}
// extendr from current `r-polars`
#[derive(Debug, Clone)]
pub struct RPolarsDataFrame(pub pl::DataFrame);
impl From<pl::DataFrame> for RPolarsDataFrame {
fn from(item: pl::DataFrame) -> Self {
RPolarsDataFrame(item)
}
}
The difference here is that for savvy we have to covert from R to Rust datatypes manually. While in extendr
this is done for us. This is not just the case for self defined structs but for all other datatypes as well. The rust part of the code needs to expect Sexp and R specific datatypes in function signatures and convert them to rust native types manually. This increases the complexity on the rust side while also polluting the function interface on the rust side. In extendr I can just define a function to accept a rust String
or u64
and extendr will handle the conversion.
Yes there are limitations to extendr
but I the people from extendr
are interested in the pain points of r-polars
and are willing to improve their library to our needs. I think it is a lot easier to work with them. That is what @CGMossa is suggesting. Switching to savvy
does not provide a clear benefit.
Rust side
1. Appropriate file division. Due to the limitations of [extendr](https://github.com/extendr/extendr), it is not possible to place multiple impl blocks. ([extendr/extendr#538](https://github.com/extendr/extendr/issues/538)) 2. Error handling. There is a lot of custom code to use the Result type with extendr, which is quite different from other packages based on extendr. ([extendr/extendr#650](https://github.com/extendr/extendr/issues/650))
Both of these things are things where extendr
is working on solutions that we can test and improve. extendr
wants the feedback of r-polars
for their implementation so they can move forward merging the proposals confidently.
The rest of the point are not issues with extendr
but with the current implementation on r-polars
side. There is no need for a different bindings backend to improve the state of affairs here.
3. Simplify type conversion. The code is difficult to follow because it uses a macro called `robj_to` for type conversion (at least in rust-analyzer).
Then the solution is to remove the macro, it is the same task no matter the bindings backend.
For 3, in py-polars and nodejs-polars, a thin
Wrap
struct wraps other types and processes them with standardFrom
traits etc., which I think makes the code cleaner.
I don't get your point here. extendr
is very similar to pyo3
. I know because I am maintaining an internal library that exports rust types to python
and r
at the same time.
All of the points below, are things that need to be improved on the r-polars
side of things anyways. I think it would be a lot simpler to address them step by step using extendr
without switching to savvy
. Switching to savvy
and addressing all of the points below is a lot more work that just changing things step by step. The switch to savvy will like involve replacing everything in extendr
to savvy
first and then addressing the things below, so why not just address the things below directly instead. I don't think that anything on that list is easier with savvy. It is the same amount of work either way.
R side
1. The structure of classes. In py-polars, the strategy is that classes defined on the Rust side (e.g., `PyDataFrame`) are wrapped by classes defined on the Python side (e.g., `DataFrame`). In r-polars, a complex strategy is adopted to update classes created by Rust side/extendr (e.g., `RPolarsDataFrame`) with a lot of custom code. (This is also related to the fact that extendr makes associated functions of Rust structs members of R classes. savvy does not mix associated functions and methods.) 2. S3 methods first. This is also related to the Rust side, in the current r-polars, generic functions like `as_polars_series` were added later, so there are several places where type conversion from R to Polars is done on the Rust side, making it difficult to understand where the type conversion is done. If type conversion from R to Polars is done with two generic functions, `as_polars_series` and `as_polars_expr`, the code will be much simpler and customization from the R side will be possible. 3. Error handling. Currently, r-polars has its own Result type on the R side, and error handling is done through it. The backtrace generated that is quite easy to understand, but it is not necessarily easy to use when using polars internally in other packages, such as `testthat::expect_error()`. 4. Based on `rlang`. Currently, r-polars has no R package dependencies. This is great, but that includes [a degraded copy of `list2()`](https://github.com/pola-rs/r-polars/blob/6eac27a0766d2b6ca92a72c1c7fa76eaeb58bb98/R/dotdotdot.R#L1-L20) instead of the convenient functions in the `rlang` package. `rlang` is a lightweight R package, and I feel that it is more beneficial to depend on the convenient functions of `rlang` than to stick to no dependencies.
The last point I have is, that it very unclear how savvy will play together with extendr
. What happens if another rust library that exports bindings to R using extendr
wants to export a polars
dataframe as the result of a function call. Currently this is possible, but if r-polars
uses savvy it is unclear what will happen. Savvy would probably need to enable compatibility
with extendr
which seems unlikely. Or the exporting library would need to switch to savvy
as well.
struct RDataQuery { … }
#[extendr]
impl RDataQuery {
…
pub fn fetch() -> RPolarsDataframe {
…
}
}
Anyway this are my thoughts on this proposal, if you think I am wrong and want to this either way I am not going to stop you. But I don't have the time to help with the transition. My priority is in maintaining the code I have, this is where my motivation to contribute to r-polars
is coming from. But switching to savvy adds a lot of maintenance on my end as a user of this library as well. I am willing to contribute to r-polars
and keep extendr
up to date and work with extendr
to resolve the issues r-polars
is having.
I am going to leave you with this quote, I don't know who to attribute it to, but I think it applies to many situations ;)
The sum of all problems stays constant
@ju6ge Thank you for looking at this.
The difference here is that for savvy we have to covert from R to Rust datatypes manually.
Perhaps you have a misunderstanding about savvy. savvy can take R objects based on Rust structs as arguments.
https://yutannihilation.github.io/savvy/guide/struct.html
extendr actually requires more complex processing to raise the appropriate error. Please compare the following two.
- extendr
r-polars/src/rust/src/lazy/whenthen.rs
Lines 27 to 33 in 086d6df
- savvy
r-polars/src/rust/src/functions/whenthen.rs
Lines 5 to 10 in 0da8c20
- ref: pyo3
https://github.com/pola-rs/polars/blob/bad13b383d628d56832c440da57bbe83a35fefba/crates/polars-python/src/functions/whenthen.rs#L6-L11
Here, in the next
branch implementation, we can take PlRExpr
directly as an argument, but in the main
branch, we need to try to convert it using a macro after receiving Robj
.
This is due to a limitation in the extendr implementation and cannot be removed as long as extendr is used.
Yes there are limitations to
extendr
but I the people fromextendr
are interested in the pain points ofr-polars
and are willing to improve their library to our needs. I think it is a lot easier to work with them.
I think I've been contributing to the extendr organization for the past few years, but I don't think so.
I think at least it's harder than rewriting r-polars based on savvy.
The last point I have is, that it very unclear how savvy will play together with
extendr
. What happens if another rust library that exports bindings to R usingextendr
wants to export apolars
dataframe as the result of a function call. Currently this is possible, but ifr-polars
uses savvy it is unclear what will happen. Savvy would probably need to enablecompatibility
withextendr
which seems unlikely. Or the exporting library would need to switch tosavvy
as well.
Please use the Arrow C stream interface.
https://arrow.apache.org/docs/format/CStreamInterface.html
As you are probably aware, many of the issues are major rewrites that have nothing to do with the choice between extendr and savvy.
On the other hand, even if we continue to use extendr, the effort will hardly change, so I think it is time to switch to savvy (the current code can hardly be reused, I don't understand the benefits of continuing to use extendr at all)
Let me also put my support for the continuing collaboration between r-polars and extendr.
As a side note: @ju6ge Discord is not the main source of discussion, decisions, and debates for extendr. Our GitHub issues, PRs, and discussions are the main sources for communications. I personally like to have an informal environment to discuss things liberally, but there is nothing we have going on right now, that is not summarised (and sometimes concisely) on GitHub. I think open-source is great, but we strive for open collaboration as well.
I must admit, the next 6 weeks, I'm very occupied with wrapping up my PhD thesis, but after that, I will definitely produce anything necessary on our end, to continue the collaboration.
There is a lot of history together with R-polars. We grew up together, to use a soft-phrase. I admit, I was pretty junior in the beginning, but now, I'm much more able to accommodate wishes that strain the current status quo.
For instance, the requested feature of having #[extendr]
work on multiple impl
-blocks is something I've experimented with, and debated, and suggested designs several times. We will definitely get there, there are some concessions that needs to be made, but I am very optimistic.
Lastly, I'd just like to declare, that I welcome any ping from you @ju6ge (or anyone from r-polars), and that I have now adopted the policy to always reply, regardless if I have a robust, solid, and final solution to the request given.
- extendr
r-polars/src/rust/src/lazy/whenthen.rs
Lines 27 to 33 in 086d6df
There is no reason to do this so convoluted with extendr
. You can just do it like this:
#[extendr]
impl RPolarsWhen {
pub fn new(condition: RPolarsExpr) -> RResult<RPolarsWhen> {
Ok(RPolarsWhen { inner: dsl::when(condition.inner) })
}
}
The only reason to use Robj directly is when excepting different kinds of R datatypes in the same function call and dynamically converting to something that polars
understands. If using Robj
in an function is required the same problems will apply to savvy
.
What I am talking about with the Sexpr
types is that in savvy I have seen no example of the following:
#[extendr]
impl Person {
pub fn name(&self) -> String { … }
}
Instead
#[savvy]
impl Person {
pub fn name(&self) -> savvy::Sexp { … }
}
So for savy all conversions happen in the rust part of the code, thereby removing type safety from the binding interface. Which in my opinion is very undesirable!
Good point with returning Result
. I must admit, I've seen other FFI packages that does this, and it seems very doable.
Found it: I suggest using this issue for asking which things you want us to prioritise (at October and until the end of the year). See roadmap issue here
extendr/extendr#783
I've got my own little list, that I go through, but it would be nice to get a prioritised list by actual users.
Awesome! I'm very happy with this input.
The only reason to use Robj directly is when excepting different kinds of R datatypes in the same function call and dynamically converting to something that
polars
understands.
I believe this was related to the fact that extendr cannot handle Result
.
See extendr/extendr#278.
So for savy all conversions happen in the rust part of the code, thereby removing type safety from the binding interface. Which in my opinion is very undesirable!
I'm sorry, but I don't understand what your point of view is.
I believe this was related to the fact that extendr cannot handle
Result
.
See extendr/extendr#278.
r-polars
is already using a feature flag that allows returning Result in function interface:
Lines 39 to 42 in 086d6df
So result types currently return an r dictionary with, two fields: ok
and err
if err is NULL
than no error happend and ok
contains the typed rust value. Otherwise err
contains the error message. r-polars
also already has unwrapping code to unwrap the extendr_result
class and throwing an R error if the result contains an Error.
Lines 34 to 44 in 086d6df
The unwrapping of Results
is annoying, it would be much nicer if extendr
could just do this automatically and trigger an error on the r
side of things, instead of requiring the library authors to handle this manually.
I don't see any reason why passing this result into a function again would make any sense, so using Robj
in case a result is returned from a functions seems like a bad idea.
I'm sorry, but I don't understand what your point of view is.
The Problem here is that when a functions returns Sexpr
in rust it looses all type information behind the return value. (Of the value in Sexpr
). This means that chaining rust functions on the rust side becomes a lot more complicated, because passing Sexpr between functions would be very cumbersome. And it requires manual conversion within the library code. It is much nicer to use only rust types on the rust side of things and letting extendr
to the heavy work for converting to and from r. That beeing said, it is really a pitty that extendr
does not detect bad transformations anymore. But that is something @CGMossa is willing to implement again, just better that the previous versions, that also did so via String comparision just on the extendr
side of things.
I don't see any reason why passing this result into a function again would make any sense, so using
Robj
in case a result is returned from a functions seems like a bad idea.
I don't know much about that feature as it is only used in r-polars, but I believe that the reason all functions now take Robj as an argument is because otherwise they would not be able to return a proper Result.
See #233 or something.
This means that chaining rust functions on the rust side becomes a lot more complicated, because passing Sexpr between functions would be very cumbersome.
What does that have to do with r-polars?
r-polars is a wrapper for polars, so basically there should be no situation where r-polars functions are chained together, right?
I don't know much about that feature as it is only used in r-polars, but I believe that the reason all functions now take Robj as an argument is because otherwise they would not be able to return a proper Result.
See #233 or something.
This does not make any sense. Nothing in that merge request touched the functions signatures on the rust
side of things. At least from the diff Robj
was used as function input before this. So Robj was there before. I think it mostly comes down to r-polars
using extendr
from very early on and no ever changing the implementation.
What does that have to do with r-polars?
r-polars is a wrapper for polars, so basically there should be no situation where r-polars functions are chained together, right?
I am just saying that savvy
increases the complexity of the function interface for native rust types. I don't see why this would be beneficial and that it will make calling functions within the rust side of r-polars
more complicated. You are corrects that this might not be that relevant since r-polars
mostly just wraps polars. But there might be edge cases where this will be relevant.
I am just saying that
savvy
increases the complexity of the function interface for native rust types.
If you think so, I suggest you open the issue on https://github.com/yutannihilation/savvy.
That should have a savvy philosophy.
https://yutannihilation.github.io/savvy/guide/key_ideas.html
In any case, the likelihood that I will be persuaded by you to use extendr seems infinitesimally small.
However, I don't think anyone will stop you from rewriting r-polars using extendr and it would be great if you could work with the extendr community to accomplish that.
Needless to say I have been too busy lately to complete the savvy based rewrite, so if you could do a better rewrite than I did it would be greatly appreciated.
This does not make any sense. Nothing in that merge request touched the functions signatures on the
rust
side of things. At least from the diffRobj
was used as function input before this. So Robj was there before. I think it mostly comes down tor-polars
usingextendr
from very early on and no ever changing the implementation.
Actually only @sorhawell knows why this is being used, but since @sorhawell is no longer active we can only read @sorhawell's comments.
#417 (comment)
In any case, the likelihood that I will be persuaded by you to use extendr seems infinitesimally small. However, I don't think anyone will stop you from rewriting r-polars using extendr and it would be great if you could work with the extendr community to accomplish that. Needless to say I have been too busy lately to complete the savvy based rewrite, so if you could do a better rewrite than I did it would be greatly appreciated.
Well I you are not the only person with a stake in r-polars
, so I don't need to persuade you of anything. If you wish to go forward with an rewrite with savvy
you can do so. I have provided my opinion on this here so that everybody can make an decision based on a diverse set of perspectives. Also you are the one doing the rewrite, I could just fork and keep it going from here if I really needed to. So the burden of proof that savvy actually is better for this use case than extendr
lies on you. I would rather update the codebase to use the newest version of extendr
with all of its features. I have made my case.
Can the latest version of extendr adopt the same directory structure as py-polars (which is now polars-python)? No, right?
If so, it would be less maintenance work to adopt savvy.
Maybe in a few years, we will be able to do the same thing with extendr, but we can think about it then, can't we?
It is difficult to maintain the current code base; problems like #1183 (comment) are frequent.
an the latest version of extendr adopt the same directory structure as py-polars (which is now polars-python)? No, right?
I don't see why directory structure would matter here. Python is different from R in how libraries work. The important part is that all classes from polars can be exported. That will stay the same no matter how you generate the bindings. The cruicial part is the conversion from r to rust. This is the part that is convoluted here. And R dynamic syntax is what makes this hard. Replacing the way the bindings are generated does nothing here. You still end up with a ffi interface that needs to be called with the correct data. Maybe savvy
can do this better, but I don't think so.
I don't see why directory structure would matter here.
This is because r-polars is a clone of python-polars, and it is far easier to track updates to python-polars if they have the same file structure. This is my conclusion after months of updating r-polars.
I just want to adopt savvy instead of extendr to make my work easier.
Please check #1126.
Hi,
I regularly follow this r-polars repo because of my works with both R and Python. Coming from R I have found py-polars to be exactly what I was waiting for since years in Python (dataframe library with a dplyr philosophy). I've also began to code in Rust thanks to pola.rs.
From an external point of view I've found r-polars not so easy, mainly due to the pl$... syntax.
It's an OOP question, S3 vs S4 vs R6...
Speaking about complete rewrite, have you think about a way to bypass the $ in your proposal? I've seen tidypolars and I'm not talking about that. Complete rewrite with a dplyr/pola.rs coding experience?
I mean have you considered another constructor than $? @ will almost do the same albeit with a better readibility? I've think about it and it's not an easy question.
Maybe this reflexion is somewhere here in your repo...
and of course thanks for your work!
Hi there!
From an external point of view I've found r-polars not so easy, mainly due to the pl$... syntax.
I agree that's probably deterring some people from using the "raw" polars
syntax directly in R, but I think it's also useful to be able to quickly compare Python and R code. Basically, except for some differences between languages (e.g. using c()
instead of []
), being able to only replace $
by .
(or vice-versa) is very convenient.
@ will almost do the same albeit with a better readibility?
Not convinced about that. @
is used mostly by S4-based packages. It seems most of them are in Bioconductor and plenty of people (including myself) never deal with that. $
is used in all packages built on R6
, including very popular ones like shiny
(I think).
Using S3 instead of $
or @
to chain operations would create massive namespace conflicts with the base library and other packages since polars
basically reimplements a bunch of basic functions, and not all of those functions are S3 generics.
I've seen tidypolars and I'm not talking about that
Why not? The point is precisely to use dplyr
and other tidyverse
packages syntax but with polars
under the hood. Isn't this what you're looking for?
I think the current "polars ecosystem" in R makes some sense: polars
for an almost identical experience as in python, tidypolars
to use the tidyverse syntax, and polarssql
for SQL syntax. That said, let us know if something is unclear to you and curious to see your opinion on that :)
Many thanks for you answer.
There are different point of views :)
My question was not really about technical aspects but more about user experience.
Whereas with pandas Python code was not neat, with pola.rs Python code is now easy to read.
For me polars's expressivness leads to focus on it (all verbs are in it, filter, group_by, with_columns, etc.). I agree that it's good to keep its syntax.
My opinion is that for readability purpose in R, I don't like too much $ in R code. This symbol is vertical and breaks lines visually. So my question with @ which is more like a ligation between terms.
In dplyr there are not $ and it was sort of a revolution in R. A good one with neat code. For code writers and also code readers (first time we see it in doc we don't write it).
In summary, I am wondering if r-polars can by itself bring another revolution to R but without $ $. We see it and we say: it's different and I like it.
(
as_polars_df(iris)
$with_columns((pl
$col('Sepal.Length') * 2)
$alias('Sepal.Length.Bigger'))
)
A dumb proposal may be with @
:
(
as_polars_df(iris)
@with_columns(
(pl@col('Sepal.Length') * 2)
@alias('Sepal.Length.Bigger'))
)
It also could be nice not to have to write pl$col
and just col
inside with_columns
by sort of inheritance? Find a way to skip col R function in this context.
About tidypolars my thinking is: it seems really a big work to keep tidypolars wired with r-polars evolutions forwarded by pola.rs ones... I mean evolutions are almost unpredictable in such a vast ecosystem. Maybe I'm not too fan of under the hood :)
But Rust pola.rs's seems more and more stable now, as data.table / dtplyr it's surely a good lead.
Again there are different point of views!
In summary, by no mean R api could be the same as Python api.
Thanks for reading.
My opinion is that for readability purpose in R, I don't like too much $ in R code. This symbol is vertical and breaks lines visually
I get your point, but we're not going to break all the polars code out there for what is in the end a matter of personal preferences. Some people prefer chaining functions with $
as in R6
-based packages.
It also could be nice not to have to write pl$col and just col inside with_columns by sort of inheritance? Find a way to skip col R function in this context.
It would be nice, but that introduces complexity since we'd need to understand when we are in a Polars context and when we're not. This might be trivial when we can detect that we're inside a with_columns()
call, but in some cases we create expressions outside a call and then execute them inside so preventing conflicts with the base function col()
is not as easy as it seems. Why not in the future, but for now there are many other priorities in r-polars.
About tidypolars my thinking is: it seems really a big work to keep tidypolars wired with r-polars evolutions forwarded by pola.rs ones...
Implementing it was some work, yes, but now most dplyr
and tidyr
functions are translated and I don't think they'll add many more in the future. Of course, there are still tons of functions from other packages that could be translated, but I'm not afraid of breaking changes in pola.rs affecting tidypolars
too much. Basically every tidypolars
release contains a few tweaks to accommodate pola.rs changes but nothing major (and I keep the polars
and tidypolars
releases quite synchronized).
Anyway, thanks for the feedback and don't hesitate if you have other questions or bug reports