myui/hivemall

Implement Spark Native UDF interface

myui opened this issue · 1 comments

myui commented

Related to #345, Hive UDF invocation is slow in Spark.
We can do better at least for UDF, currently not for UDAF/UDTF, by implementing Spark's Java UDF{1,...,22} as well as implementing Hive's UDF.

class AngularDistanceUDF extends GenericUDF implements org.apache.spark.sql.api.java.UDF2
https://github.com/myui/hivemall/blob/master/core/src/main/java/hivemall/knn/distance/AngularDistanceUDF.java

Also, we can prepare some helper methods for Spark API in
https://github.com/myui/hivemall/blob/master/core/src/main/java/hivemall/UDFWithOptions.java

@maropu How do you think?

yea, I think it's a good idea. I'll try later.