Functions as column extensions

Question

Functions as column extensions

Closed this issue 3 years ago · 8 comments

@snithish @gorros @kirubakarrs @oscarvarto - I’ve always thought it’s a bit random how Spark defines some functionality as Column functions and other functionality as SQL functions. Here’s an example of the inconsistency:

lower(col("blah").substr(0, 2))

Having two SQL functions would look like this:

lower(substr(col("blah"), 0, 2))

Having two Column functions would look like this:

col("blah").substr(0, 2).lower()

I like the Column functions syntax, so I started monkey patching the SQL functions to the Column class: https://github.com/MrPowers/spark-daria/blob/master/src/main/scala/com/github/mrpowers/spark/daria/sql/FunctionsAsColumnExt.scala Let me know your thoughts.

Answer 1 · 2018-12-16T10:34:33.000Z

@MrPowers I prefer this syntax col("blah").substr(0, 2).lower(), but I am curious if there is a way to do that for all functions without explicitly defining them.

Answer 2 · 2018-12-16T13:58:31.000Z

@MrPowers but I can assist in monkey patching :)

Answer 3 · 2018-12-17T23:58:21.000Z

@gorros - Thanks for the help @gorros! Let me know if you find a clever way to do this without explicitly defining all the functions. In the meantime, I'm going to keep adding functions in the pattern you laid out in PR #51. Thanks!

Answer 4 · 2018-12-18T04:14:44.000Z

@MrPowers non-explicit solutions rely on reflection and I am not sure if they will work with implicit conversion. Also, I am not a fun of reflection. But I will try some more.

Answer 5 · 2018-12-18T07:39:22.000Z

@MrPowers I came with another idea. Would you like to have the following syntax for the methods without additional arguments
col(" SOME_String ")|trim|lower
without rewriting them method?

Answer 6 · 2018-12-18T14:56:42.000Z

My 2 cents. I like fluent interfaces over operator overloading as I find it creates more defensible code wrt keeping things as simple as possible.

However, I find the pipe overloading rather scala elegant, would support that.

One thing that might be worth it to try is the Dynamic trait https://www.scala-lang.org/api/2.12.x/scala/Dynamic.html

Don't have much experience with it but might allow the Fluent interface with little code.

Probably a concern: As of Scala 2.10, defining direct or indirect subclasses of this trait is only possible if the language feature dynamics is enabled.

Answer 7 · 2018-12-18T16:07:10.000Z

@eclosson Thanks for info about Dynamic, I will check it out.

Answer 8 · 2019-01-12T19:28:40.000Z

@MrPowers did you have chance to review my above suggestion?