Azure/spark-cdm-connector

[Issue]reading error "AnalysisException: Manifest doesn't exist: model.json"

Opened this issue · 0 comments

Did you read the pinned issues and search the error message?

Yes, but I didn't find the answer.

Summary of issue

We have several tables to be ingested using the notebook, they will run in paralle with read operaion. And some tables of them will fail everytime and different tables failed at different runs.

Rerun will work, but it will fail again next time. There is no problem before, but some tables will fail from few days before, without modificaion.
The issue is reading parallelly using the same manifestPath , not have writing parallelly operation.

cluster DBR verion: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)

The error message shows "AnalysisException: Manifest doesn't exist: model.json":

AnalysisException                         Traceback (most recent call last)
in
----> 1 df = (spark.read.format("com.microsoft.cdm")
      2   .option("storage", storagePath)
      3   .option("manifestPath", sourceFileSystem + "/model.json")
      4   .option("entity", entity)
      5   .option("appId", appId)

 

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
    208             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    209         else:
--> 210             return self._df(self._jreader.load())
    211 
    212     def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,

 

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306

 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    121                 # Hide where the exception came from that shows a non-Pythonic
    122                 # JVM exception message.
--> 123                 raise converted from None
    124             else:
    125                 raise

Error stack trace

No response

Platform name

Azure Databricks

Spark version

3.1.2

CDM jar version

1.19.2

What is the format of the data you are trying to read/write?

.csv