This is a simple project for experimenting with Spark 3.
It is possible to create your own column functions, but currently (February 2022) it is still an experimental API and you have to place your function in the package org.apache.spark.sql
.
See a basic implementation in the file MillisToTs.scala and the corresponding example.
Again, here we have an experimental API and we need to create a bit more objects:
- The logical Plan object (in our case AlreadySorted).
- The physical Plan object (AlreadySortedExec).
- The strategy object is responsible for converting the logical plan to the physical plan (AlreadySortedStrategy).
All these objects are located in the file AlreadySorted.scala.
The last thing we need to do is register our strategy. This code and usage example you can find in the file App.scala.
Used materials:
https://medium.com/@vladimir.prus/advanced-custom-operators-in-spark-79b12da61ca7 https://medium.com/@vladimir.prus/spark-partitioning-full-control-3c72cea2d74d