[FEATURE] Improve performance of KLLSketch and DataType Analyzer
zeotuan opened this issue · 0 comments
zeotuan commented
Is your feature request related to a problem? Please describe.
Currently, KLLSketch
and DataType
analyzer is implemented use the UserDefinedAggregateFunction
which is considered deprecated and should be replaced with Aggregator which offer much greater performance which was outlined here apache/spark#25024 (comment)
Describe the solution you'd like
Reimplement StatefulDataType
and StatefulKLLSketch
using Aggregator
I am happy to help with this implementation.