audienceproject/spark-dynamodb

Using spark-dynamodb on databricks notebook "Builtin jars can only be used when hive execution version == hive metastore version"

Closed this issue · 1 comments

Hi,

I installed your package on our databricks cluster. After installation I could not run any scala in the notebook. Needless to say removing the package resolves the issue. I looked into driver logs and I thin this is the source of the problem:

Builtin jars can only be used when hive execution version == hive metastore version

Could you please advise?

Thanks,
Sina

Here is the log:

20/08/21 14:36:12 INFO DriverCorral: Successfully attached library dbfs:/FileStore/jars/maven/org/apache/httpcomponents/httpclient-4.5.9.jar to Spark
20/08/21 14:36:12 INFO LibraryState: Successfully attached library dbfs:/FileStore/jars/maven/org/apache/httpcomponents/httpclient-4.5.9.jar
20/08/21 14:36:46 INFO DriverCorral: Starting scala repl ReplId-2cd01-b4570-c31c1-3
20/08/21 14:36:46 WARN ScalaDriverLocal: loadLibraries: Libraries failed to be installed: Set()
20/08/21 14:36:46 INFO LogicalPlanStats: Setting LogicalPlanStats visitor to com.databricks.sql.optimizer.statsEstimation.DatabricksLogicalPlanStatsVisitor$
20/08/21 14:36:47 ERROR ScalaDriverLocal: Failed to add jar /local_disk0/tmp/addedFile7582802293255035276httpcore_4_4_11-d55e7.jar
java.lang.IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 1.2.1. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:363)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:326)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:76)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:75)
at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:108)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:148)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:359)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:147)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:290)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$resourceLoader$1(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client$lzycompute(HiveSessionStateBuilder.scala:129)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client(HiveSessionStateBuilder.scala:129)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:131)
at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:41)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:233)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3682)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3680)
at org.apache.spark.sql.Dataset.(Dataset.scala:233)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:663)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:658)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:672)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$3(DriverLocal.scala:154)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1038)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$2(DriverLocal.scala:154)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at com.databricks.backend.daemon.driver.DriverLocal.(DriverLocal.scala:138)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.(ScalaDriverLocal.scala:45)
at com.databricks.backend.daemon.driver.ScalaDriverWrapper.instantiateDriver(DriverWrapper.scala:704)
at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:299)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:218)
at java.lang.Thread.run(Thread.java:748)
20/08/21 14:36:47 WARN ScalaDriverWrapper: Failed to start repl ReplId-2cd01-b4570-c31c1-3
java.lang.IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 1.2.1. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:363)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:326)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:76)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:75)
at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:108)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:148)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:359)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:147)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:290)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$resourceLoader$1(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client$lzycompute(HiveSessionStateBuilder.scala:129)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client(HiveSessionStateBuilder.scala:129)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:131)
at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:41)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:233)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3682)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3680)
at org.apache.spark.sql.Dataset.(Dataset.scala:233)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:663)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:828)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:658)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:672)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$3(DriverLocal.scala:154)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1038)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$new$2(DriverLocal.scala:154)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at com.databricks.backend.daemon.driver.DriverLocal.(DriverLocal.scala:138)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.(ScalaDriverLocal.scala:45)
at com.databricks.backend.daemon.driver.ScalaDriverWrapper.instantiateDriver(DriverWrapper.scala:704)
at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:299)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:218)
at java.lang.Thread.run(Thread.java:748)
20/08/21 14:36:47 WARN ScalaDriverWrapper: setupRepl:ReplId-2cd01-b4570-c31c1-3: at the end, the status is Error(ReplId-2cd01-b4570-c31c1-3,java.lang.IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 1.2.1. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.)
20/08/21 14:36:47 INFO DriverCorral$: Cleaning the wrapper ReplId-2cd01-b4570-c31c1-3 (currently in status Stopped(ReplId-2cd01-b4570-c31c1-3))
20/08/21 14:36:47 INFO DriverCorral$: sending shutdown signal for REPL ReplId-2cd01-b4570-c31c1-3
20/08/21 14:36:47 WARN ScalaDriverWrapper: Repl ReplId-2cd01-b4570-c31c1-3 is already shutting down: Stopped(ReplId-2cd01-b4570-c31c1-3)
20/08/21 14:36:47 INFO DriverCorral$: sending the interrupt signal for REPL ReplId-2cd01-b4570-c31c1-3
20/08/21 14:36:47 INFO DriverCorral$: waiting for localThread to stop for REPL ReplId-2cd01-b4570-c31c1-3
20/08/21 14:36:47 INFO DriverCorral$: ReplId-2cd01-b4570-c31c1-3 successfully discarded

Hi,

It turns out that the error was emanating from Databricks. If anyone has the same issue I used the following spark configurations to make com.audienceproject:spark-dynamodb_2.12:1.1.0 work on Databricks 7.2 (Spark 3.0.0, Scala 2.12)

spark.sql.hive.metastore.jars builtin
spark.sql.hive.metastore.version 2.3.7
hive.metastore.schema.verification.record.version false
hive.metastore.schema.verification false 

I am going to close this issue
Sina