Add a .list.apply() expression
Opened this issue · 1 comments
MisterKloudy commented
Would apply a function on each element in the list
Example:
df = daft.from_pydict({"a": [["HeLLo WoRlD", "Hi", "WelCoMe"], ["tO", "a New WoRlD"]]})
df.with_column("b", col("a").list.apply(element().str.lower())).select("b").show()
Expected output:
╭─────────────╮
│ b │
│ --- │
│ List[Struct[value: Utf8, count: Int32]] │
╞═════════════╡
│ [["hello world", "hi", "welcome"]] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["to", "a new world"] │
╰─────────────╯
(Showing first 2 of 2 rows)
jaychia commented
Two APIs come to mind for me:
col("a").list.apply(daft.element().str.lower())
col("a").list.apply(lambda el: el.str.lower())
Also maybe we want to use a different name than apply
, since that's already how we do single-row UDFs? Other names that might be good:
- I think Polars uses
.eval
- Maybe
.list.map
? Since Python has the concept of mapping data.
I'm also in favor of throwing an error in the case of an aggregation expression being provided (e.g. .list.map(lambda x: x.sum())
). This makes the most sense maybe if paired with .list.map
since it has a strong connotation of preserving output cardinality...
@universalmind303 could you take this issue?