Ibotta/sk-dist

Any chance to use also the Pandas_UDF interface for even faster speeds?

Opened this issue · 0 comments

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Not a problem per se. But Spark UDFs are slower than pyspark Pandas_UDFs. And both are slower than Scala UDFs
Pandas_udfs however are in python and use the pandas interface internally so they are easier to code.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Any chance that you could add functionality so things can be achieved via the Pandas_UDF interface

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

This is how PANDAS_UDFs work internally.

image

For more info:
https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html