Functions as column extensions
Closed this issue · 8 comments
@snithish @gorros @kirubakarrs @oscarvarto - I’ve always thought it’s a bit random how Spark defines some functionality as Column functions and other functionality as SQL functions. Here’s an example of the inconsistency:
lower(col("blah").substr(0, 2))
Having two SQL functions would look like this:
lower(substr(col("blah"), 0, 2))
Having two Column functions would look like this:
col("blah").substr(0, 2).lower()
I like the Column functions syntax, so I started monkey patching the SQL functions to the Column class: https://github.com/MrPowers/spark-daria/blob/master/src/main/scala/com/github/mrpowers/spark/daria/sql/FunctionsAsColumnExt.scala Let me know your thoughts.
@MrPowers I prefer this syntax col("blah").substr(0, 2).lower()
, but I am curious if there is a way to do that for all functions without explicitly defining them.
@MrPowers non-explicit solutions rely on reflection and I am not sure if they will work with implicit conversion. Also, I am not a fun of reflection. But I will try some more.
@MrPowers I came with another idea. Would you like to have the following syntax for the methods without additional arguments
col(" SOME_String ")|trim|lower
without rewriting them method?
My 2 cents. I like fluent interfaces over operator overloading as I find it creates more defensible code wrt keeping things as simple as possible.
However, I find the pipe overloading rather scala elegant, would support that.
One thing that might be worth it to try is the Dynamic trait https://www.scala-lang.org/api/2.12.x/scala/Dynamic.html
Don't have much experience with it but might allow the Fluent interface with little code.
Probably a concern: As of Scala 2.10, defining direct or indirect subclasses of this trait is only possible if the language feature dynamics is enabled.