ankane/ruby-polars

`Polars.date_range(...)` can't accept `Polars.col(...)` args

DeflateAwning opened this issue · 2 comments

Example

# Create the DataFrame
df = Polars::DataFrame.new({
  'id' => Polars::Series.new([1, 2, 3]),
  'date_min' => Polars::Series.new(["2020-01-01", "2021-06-15", "2022-11-01"]).cast(Polars::Date),
  'date_max' => Polars::Series.new(["2020-12-01", "2021-12-15", "2023-05-01"]).cast(Polars::Date)
})

# Add a new column with a date range
# THIS STEP does not work
df = df.with_columns(
  Polars.date_range(Polars.col('date_min'), Polars.col('date_max'), "1mo").alias('date')
)
# ERROR: RuntimeError (not found: date_min

# Next step in my code (for context):
# df = df.explode('date') # explode the date range into individual dates

Note: haven't tested with the test df exactly yet

Hi @DeflateAwning, this isn't supported in Polars.

import polars as pl

df = pl.DataFrame({
  'id': pl.Series([1, 2, 3]),
  'date_min': pl.Series(['2020-01-01', '2021-06-15', '2022-11-01']).cast(pl.Date),
  'date_max': pl.Series(['2020-12-01', '2021-12-15', '2023-05-01']).cast(pl.Date)
})

df.with_columns(
  pl.date_range(pl.col('date_min'), pl.col('date_max'), '1mo').alias('date')
)

raises

polars.exceptions.ComputeError: `start` must contain exactly one value, got 3 values

Requested this in the upstream project with Python: pola-rs/polars#16084

Perhaps we could leave this issue open and categorize it as waiting until that issue is implemented?

Edit: Opened a new issue to track adding the appropriate pl.date_ranges() function