swoop-inc/spark-alchemy

Error using HLL Functions in Spark

Closed this issue · 2 comments

Hello Devs

We are using EMR 6.1 with spark version 3.0 and spark-alchemy 2.12-1.0.1.jar.
HLL functions were successfully registered in Zeppelin notebook with the below command:
%spark com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration.registerFunctions(spark)

When trying to process our data column (exposure_hll) containing HLL sketches, we received an error :
%sql
select hll_cardinality(hll_merge(exposure_hll)) from table1

ERROR:
Error happens in sql: select hll_cardinality(hll_merge(exposure_hll)) from table1
net/agkn/hll/serialization/IHLLMetadata; line 1 pos 23

Attaching the full error log and below are few lines from the full error log:

Caused by: org.apache.spark.sql.AnalysisException: net/agkn/hll/serialization/IHLLMetadata; line 1 pos 23
at org.apache.spark.sql.EncapsulationViolator$.createAnalysisException(EncapsulationViolator.scala:11)
at com.swoop.alchemy.spark.expressions.NativeFunctionRegistration.$anonfun$expression$4(NativeFunctionRegistration.scala:64)
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:121)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1439)
at org.apache.spark.sql.hive.HiveSessionCatalog.super$lookupFunction(HiveSessionCatalog.scala:135)
at org.apache.spark.sql.hive.HiveSessionCatalog.$anonfun$lookupFunction0$2(HiveSessionCatalog.scala:135)

spark HLL Function error log .txt

Any help would be HUGELY appreciated ! Thanks !!

Hello guys, any updates about this error? I have the same error in Databricks with spark-alchemy library

pidge commented

#27 has more details and a workaround.