How to use Apache Spark Connector?

Question

How to use Apache Spark Connector?

Rap70r opened this issue 2 years ago · 0 comments

Hello,

Is it possible to use Apache Spark Connector for SQL Server with metorikku?

https://github.com/microsoft/sql-spark-connector

I'm trying to create a custom UDF that I can pass a dataframe to it and load the dataframe into a sql server table. This is what I have so far. I created a custom code with the run function that is going to be in the custom jar:

object SomeObject {

def run(ss: org.apache.spark.sql.SparkSession, metricName: String, dataFrameName: String, params: Option[Map[String, String]]): Unit = {
  
          val server_name = "jdbc:sqlserver://{SERVER_ADDR}"
	  val database_name = "database_name"
	  val url = server_name + ";" + "databaseName=" + database_name + ";"
  
	  val table_name = "table_name"
	  val username = "username"
	  val password = "password"
  
	  df_name_here.write
		  .format("com.microsoft.sqlserver.jdbc.spark")
		  .mode("overwrite")
		  .option("url", url)
		  .option("dbtable", table_name)
		  .option("user", username)
		  .option("password", password)
		  .save()
    }
}

Note: The reason I'm using custom code is so I can be able to use this format "com.microsoft.sqlserver.jdbc.spark".

Can you please help figure out how to pass the dataframe to the function so I can replace <df_name_here> with it?

Unless, I can use standard JDBC output and specify "format("com.microsoft.sqlserver.jdbc.spark")".

Not sure if it's possible since it takes the value from the driver:
https://github.com/YotpoLtd/metorikku/blob/master/src/main/scala/com/yotpo/metorikku/output/writers/jdbc/JDBCOutputWriter.scala#L42

Thank you