Azure/spark-cdm-connector

[Issue] Could not read '/foundations.cdm.json'

Closed this issue · 8 comments

Did you read the pinned issues and search the error message?

Yes, but I didn't find the answer.

Summary of issue

I have been writing to an azure storage account with this connector from Databricks for several months using the following config.

my_df.repartition(1) \
  .write.format("com.microsoft.cdm") \
  .option("appId", appID) \
  .option("appKey", appKey) \
  .option("tenantId", tenantID) \
  .option("storage", storage) \
  .option("manifestPath",  "powerbi/GSG-GSS-ThingWorx/DPS_DeviceConnectivity/default.manifest.cdm.json") \
  .option("entity", "PerInstrument") \
  .mode("append") \
  .save()

twice now the connector has started to throw errors about the foundations.cdm.json. While throwing these errors it will consistently fail for several days, and then seemingly recover on its own.

Are there any ideas on why this failure is popping up and fixing itself seemingly at random?

Error stack trace

Py4JJavaError: An error occurred while calling o1174.save.
: org.apache.spark.SparkException: [WRITING_JOB_ABORTED] Writing job aborted
	at org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:996)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:419)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:363)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.writeWithV2(WriteToDataSourceV2Exec.scala:69)
	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:516)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1731)
	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:501)
	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:496)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:69)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:94)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:229)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:249)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:399)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:194)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:148)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:349)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:229)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:214)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:227)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:220)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:220)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:220)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:174)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:165)
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:256)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:381)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:259)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: java.lang.UnsupportedOperationException: Not supported
		at com.microsoft.cdm.CDMCatalog.dropTable(CDMCatalog.scala:50)
		at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$2(WriteToDataSourceV2Exec.scala:534)
		at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1742)
		... 48 more
Caused by: java.util.concurrent.ExecutionException: java.lang.Exception: PersistenceLayer | Could not read '/foundations.cdm.json' from the 'cdm' namespace. Reason 'com.microsoft.commondatamodel.objectmodel.storage.StorageAdapterException: Could not read content at path: /logical/foundations.cdm.json' | loadDocumentFromPathAsync | /foundations.cdm.json
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at com.microsoft.cdm.utils.CDMModelWriter.createRootManifestWithEntityScratch(CDMModelWriter.scala:345)
	at com.microsoft.cdm.utils.CDMModelWriter.createEntity(CDMModelWriter.scala:488)
	at com.microsoft.cdm.write.CDMBatchWriter.$anonfun$commit$4(CDMBatchWriter.scala:218)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at com.microsoft.cdm.log.SparkCDMLogger$.logEventToKustoForPerf(SparkCDMLogger.scala:43)
	at com.microsoft.cdm.write.CDMBatchWriter.commit(CDMBatchWriter.scala:219)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:402)
	... 52 more
Caused by: java.lang.Exception: PersistenceLayer | Could not read '/foundations.cdm.json' from the 'cdm' namespace. Reason 'com.microsoft.commondatamodel.objectmodel.storage.StorageAdapterException: Could not read content at path: /logical/foundations.cdm.json' | loadDocumentFromPathAsync | /foundations.cdm.json
	at com.microsoft.cdm.utils.CDMCallback$.apply(CDMUtils.scala:80)
	at com.microsoft.commondatamodel.objectmodel.utilities.logger.Logger.log(Logger.java:194)
	at com.microsoft.commondatamodel.objectmodel.utilities.logger.Logger.error(Logger.java:88)
	at com.microsoft.commondatamodel.objectmodel.persistence.PersistenceLayer.lambda$loadDocumentFromPathAsync$0(PersistenceLayer.java:188)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Platform name

Databricks

Spark version

3.3.0

CDM jar version

spark3.3-1.19.5

What is the format of the data you are trying to read/write?

.csv

the default location https://cdm-schema.microsoft.com/logical/ for resolving cdm: URIs has an outdated TLS cert.

image

The TLS cert has been updated and the spark-cdm connector is working just fine now.

The TLS cert has been updated and the spark-cdm connector is working just fine now.

Closing this as resolved

This issue is back.

Opened support case 2404010010003788 with Azure as well.

Maybe releated? I saw this on the CDM store repo

The CDM Schema Store will be shut down by end of March '24, and any services still using the older CDM SDK releases may start failing due to unavailability of the store.

image

Hello @carlo-quinonez @PrestonGiorgianni, if you are using an older cdm connector version, please upgrade to the latest. Please check out the releases and issue #162

Thank you for the fix