pola-rs/r-polars

Do we have a way to create `object` and `struct` with classic R functions?

etiennebacher opened this issue ยท 8 comments

I don't think we have a way to create object and struct from our standard c() and list() but maybe I'm missing something?

It would be good to have a small table in the docs to show the equivalent (if any) of those:

pl.Series(values=[1])
shape: (1,)
Series: '' [i64]
[
        1
]

> pl$Series(values = 1)
polars Series: shape: (1,)
Series: '' [f64]
[
	1.0
]
>>> pl.Series(values=[[1]])
shape: (1,)
Series: '' [list[i64]]
[
        [1]
]

> pl$Series(values = list(1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]
>>> pl.Series(values=[{1}])
shape: (1,)
Series: '' [o][object]
[
        {1}
]

???
>>> pl.Series(values=[{"a": 1}])
shape: (1,)
Series: '' [struct[1]]
[
        {1}
]

???

Are you looking for pl$Series(values = data.frame(a = 1))?

IIUC, the object type is Python-specific, not a real Apache Arrow type (so we don't support it).

Are you looking for pl$Series(values = data.frame(a = 1))?

This is equivalent to calling a list:

> pl$Series(values = data.frame(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]
> pl$Series(values = list(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]

Oh, sorry. This is the one.

r-polars/R/as_polars.R

Lines 367 to 371 in 3c0d0ec

#' @rdname as_polars_series
#' @export
as_polars_series.data.frame = function(x, name = NULL, ...) {
pl$DataFrame(unclass(x))$to_struct(name = name)
}

Can we close this now that #1015 has been merged?
As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.
(Since R's list can contain a variety of things, we can always use the base R data.frame if we want to store something that is not supported by Apache Arrow)

As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.

That's something worth mentioning in the docs I think. I'll add that in #1014 and close this issue with this PR

Actually it's hard to construct Struct for Series:

>>> pl.Series([{"a": 1, "b": ["x", "y"]}, {"a": 2, "b": ["z"]}])
shape: (2,)
Series: '' [struct[2]]
[
        {1,["x", "y"]}
        {2,["z"]}
]
as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)

polars Series: shape: (2,)
Series: '' [struct[3]]
[
	{1,"x","z"}
	{2,"y","z"}
]

And it doesn't work for DataFrame:

pl$DataFrame(
  data.frame(a = 1)
)

shape: (1, 1)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”
โ”‚ a   โ”‚
โ”‚ --- โ”‚
โ”‚ f64 โ”‚
โ•žโ•โ•โ•โ•โ•โ•ก
โ”‚ 1.0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”˜

Maybe we should say that we can't reliably create a Struct from scratch and point towards $to_struct() instead

Actually it's hard to construct Struct for Series:

We should use the I() function to create a list type column with data.frame().
Or, we can use tibble::tibble() or data.table::data.table() instead.

polars::as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[3]]
#> [
#>  {1,"x","z"}
#>  {2,"y","z"}
#> ]

polars::as_polars_series(
  data.frame(a = 1:2, b = I(list(c("x", "y"), "z")))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  tibble::tibble(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  data.table::data.table(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

Created on 2024-04-10 with reprex v2.1.0

And it doesn't work for DataFrame:

pl$DataFrame() works like as_polars_df() when it receives a data.frame.
(I think this behavior is worth removing because I find it confusing, but the point is that data.frame() works the same way, and in Python, polars.DataFrame.__init__() will convert a pandas.DataFrame to a polars.DataFame, so this is consistent behavior)

polars::pl$DataFrame(data.frame(a = 1))
#> shape: (1, 1)
#> โ”Œโ”€โ”€โ”€โ”€โ”€โ”
#> โ”‚ a   โ”‚
#> โ”‚ --- โ”‚
#> โ”‚ f64 โ”‚
#> โ•žโ•โ•โ•โ•โ•โ•ก
#> โ”‚ 1.0 โ”‚
#> โ””โ”€โ”€โ”€โ”€โ”€โ”˜
polars::pl$DataFrame(a = data.frame(a = 1))
#> shape: (1, 1)
#> โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
#> โ”‚ a         โ”‚
#> โ”‚ ---       โ”‚
#> โ”‚ struct[1] โ”‚
#> โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
#> โ”‚ {1.0}     โ”‚
#> โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Created on 2024-04-10 with reprex v2.1.0