mrpowers-io/spark-daria

Functions as column extensions

Closed this issue · 8 comments

@snithish @gorros @kirubakarrs @oscarvarto - I’ve always thought it’s a bit random how Spark defines some functionality as Column functions and other functionality as SQL functions. Here’s an example of the inconsistency:

lower(col("blah").substr(0, 2))

Having two SQL functions would look like this:

lower(substr(col("blah"), 0, 2))

Having two Column functions would look like this:

col("blah").substr(0, 2).lower()

I like the Column functions syntax, so I started monkey patching the SQL functions to the Column class: https://github.com/MrPowers/spark-daria/blob/master/src/main/scala/com/github/mrpowers/spark/daria/sql/FunctionsAsColumnExt.scala Let me know your thoughts.

@MrPowers I prefer this syntax col("blah").substr(0, 2).lower(), but I am curious if there is a way to do that for all functions without explicitly defining them.

@MrPowers but I can assist in monkey patching :)

@gorros - Thanks for the help @gorros! Let me know if you find a clever way to do this without explicitly defining all the functions. In the meantime, I'm going to keep adding functions in the pattern you laid out in PR #51. Thanks!

@MrPowers non-explicit solutions rely on reflection and I am not sure if they will work with implicit conversion. Also, I am not a fun of reflection. But I will try some more.

@MrPowers I came with another idea. Would you like to have the following syntax for the methods without additional arguments
col(" SOME_String ")|trim|lower
without rewriting them method?

My 2 cents. I like fluent interfaces over operator overloading as I find it creates more defensible code wrt keeping things as simple as possible.

However, I find the pipe overloading rather scala elegant, would support that.

One thing that might be worth it to try is the Dynamic trait https://www.scala-lang.org/api/2.12.x/scala/Dynamic.html

Don't have much experience with it but might allow the Fluent interface with little code.

Probably a concern: As of Scala 2.10, defining direct or indirect subclasses of this trait is only possible if the language feature dynamics is enabled.

@eclosson Thanks for info about Dynamic, I will check it out.

@MrPowers did you have chance to review my above suggestion?