Gmousse/dataframe-js

[FEATURE] Implicit column for dataframe API

Opened this issue ยท 7 comments

Is your feature request related to a problem? Please describe.
I find it very redundant and unnecessary to repeat typing a column reference for api functions that act on a single column dataframe.

Describe the solution you'd like
Implicitly determine the column for functions that are applied to dataframes with only one column. This could easily be done by checking length of [...this[__columns__]]. This would be particularly useful for df.toArray() since currently using it on a single column returns an array of single element arrays which I can't see any use case for. That said, I believe many other functions can be served from this feature and it will also reduce a lot of repetition.

Describe alternatives you've considered
Perhaps a new Series object similar to pandas library.

Additional context
N/A

Hi that's a good point.
I will see what could be done !
Thank you for your suggestion

Thank you for the reply and interest in this issue, @Gmoussee!

Since I last posted, I have come up with a more interesting use case: this feature would make introspection of variables a much more pleasant experience
Take the below two examples. One is pandas, other is dataframe-js

# pandas

df[column].unique()
// dataframe-js

df.select(column).distinct(column).show(column)

You see, many of us come from a pandas background so we get confused about the verbosity that we need to use when composing our functions.

Yep that's clear.
I m working on a new version on an experimental branch.
I will make some tries.

@Gmousse, have you made any progress on this? I keep running into situations where this redundancy comes up so it would be a really nice feature to have. If this is an issue about time, then let me know and I'd be happy to submit a PR ๐Ÿ˜ƒ

Hi, sorry I was a bit busy these days.
I must work on the api proposal for this feature.

Hi @mbkupfer, I m currently working on it, I will submit (in this issue) a proposal about the api.

Hi, It would be also nice if the df.drop('column2') could accept an array of column names instead of a single column name. Can this be included in new release?