Eventual-Inc/Daft

Add a .list.apply() expression

Opened this issue · 1 comments

Would apply a function on each element in the list

Example:

df = daft.from_pydict({"a": [["HeLLo WoRlD", "Hi", "WelCoMe"], ["tO", "a New WoRlD"]]})
df.with_column("b", col("a").list.apply(element().str.lower())).select("b").show()

Expected output:

╭─────────────╮
│ b           │ 
│ ---         │
│ List[Struct[value: Utf8, count: Int32]] │
╞═════════════╡
│ [["hello world", "hi", "welcome"]]   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["to", "a new world"]   │
╰─────────────╯

(Showing first 2 of 2 rows)

Two APIs come to mind for me:

col("a").list.apply(daft.element().str.lower())
col("a").list.apply(lambda el: el.str.lower())

Also maybe we want to use a different name than apply, since that's already how we do single-row UDFs? Other names that might be good:

  1. I think Polars uses .eval
  2. Maybe .list.map? Since Python has the concept of mapping data.

I'm also in favor of throwing an error in the case of an aggregation expression being provided (e.g. .list.map(lambda x: x.sum())). This makes the most sense maybe if paired with .list.map since it has a strong connotation of preserving output cardinality...

@universalmind303 could you take this issue?