Do we have a way to create `object` and `struct` with classic R functions?
etiennebacher opened this issue ยท 8 comments
I don't think we have a way to create object
and struct
from our standard c()
and list()
but maybe I'm missing something?
It would be good to have a small table in the docs to show the equivalent (if any) of those:
pl.Series(values=[1])
shape: (1,)
Series: '' [i64]
[
1
]
> pl$Series(values = 1)
polars Series: shape: (1,)
Series: '' [f64]
[
1.0
]
>>> pl.Series(values=[[1]])
shape: (1,)
Series: '' [list[i64]]
[
[1]
]
> pl$Series(values = list(1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
[1.0]
]
>>> pl.Series(values=[{1}])
shape: (1,)
Series: '' [o][object]
[
{1}
]
???
>>> pl.Series(values=[{"a": 1}])
shape: (1,)
Series: '' [struct[1]]
[
{1}
]
???
Are you looking for pl$Series(values = data.frame(a = 1))
?
IIUC, the object type is Python-specific, not a real Apache Arrow type (so we don't support it).
Are you looking for
pl$Series(values = data.frame(a = 1))
?
This is equivalent to calling a list:
> pl$Series(values = data.frame(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
[1.0]
]
> pl$Series(values = list(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
[1.0]
]
Can we close this now that #1015 has been merged?
As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.
(Since R's list can contain a variety of things, we can always use the base R data.frame if we want to store something that is not supported by Apache Arrow)
As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.
That's something worth mentioning in the docs I think. I'll add that in #1014 and close this issue with this PR
Actually it's hard to construct Struct
for Series
:
>>> pl.Series([{"a": 1, "b": ["x", "y"]}, {"a": 2, "b": ["z"]}])
shape: (2,)
Series: '' [struct[2]]
[
{1,["x", "y"]}
{2,["z"]}
]
as_polars_series(
data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)
polars Series: shape: (2,)
Series: '' [struct[3]]
[
{1,"x","z"}
{2,"y","z"}
]
And it doesn't work for DataFrame
:
pl$DataFrame(
data.frame(a = 1)
)
shape: (1, 1)
โโโโโโโ
โ a โ
โ --- โ
โ f64 โ
โโโโโโโก
โ 1.0 โ
โโโโโโโ
Maybe we should say that we can't reliably create a Struct
from scratch and point towards $to_struct()
instead
Actually it's hard to construct
Struct
forSeries
:
We should use the I()
function to create a list type column with data.frame()
.
Or, we can use tibble::tibble()
or data.table::data.table()
instead.
polars::as_polars_series(
data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[3]]
#> [
#> {1,"x","z"}
#> {2,"y","z"}
#> ]
polars::as_polars_series(
data.frame(a = 1:2, b = I(list(c("x", "y"), "z")))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#> {1,["x", "y"]}
#> {2,["z"]}
#> ]
polars::as_polars_series(
tibble::tibble(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#> {1,["x", "y"]}
#> {2,["z"]}
#> ]
polars::as_polars_series(
data.table::data.table(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#> {1,["x", "y"]}
#> {2,["z"]}
#> ]
Created on 2024-04-10 with reprex v2.1.0
And it doesn't work for
DataFrame
:
pl$DataFrame()
works like as_polars_df()
when it receives a data.frame.
(I think this behavior is worth removing because I find it confusing, but the point is that data.frame()
works the same way, and in Python, polars.DataFrame.__init__()
will convert a pandas.DataFrame to a polars.DataFame, so this is consistent behavior)
polars::pl$DataFrame(data.frame(a = 1))
#> shape: (1, 1)
#> โโโโโโโ
#> โ a โ
#> โ --- โ
#> โ f64 โ
#> โโโโโโโก
#> โ 1.0 โ
#> โโโโโโโ
polars::pl$DataFrame(a = data.frame(a = 1))
#> shape: (1, 1)
#> โโโโโโโโโโโโโ
#> โ a โ
#> โ --- โ
#> โ struct[1] โ
#> โโโโโโโโโโโโโก
#> โ {1.0} โ
#> โโโโโโโโโโโโโ
Created on 2024-04-10 with reprex v2.1.0