Azure/spark-cdm-connector

default.manifest.cdm.json not getting created

Closed this issue · 2 comments

I'm using PySpark with Apache Spark 2.4 on Azure Synapse Analytics. I'm trying to run the sample code (https://github.com/Azure/spark-cdm-connector/blob/master/samples/SparkCDMsamplePython.ipynb) provided in this repo to write a simple DataFrame to CDM format in an ADLS Gen2 path.

When I run this

(df.write.format("com.microsoft.cdm")
.option("storage", storageAccountName)
.option("manifestPath", manifestPath)
.option("entity", entityName)
.option("format", "parquet")
.mode("Overwrite")
.save())

image

the parquet files get copied over to the folder location above, but the the default.manifest.cdm.json doesn't get generated like it should.

Why is this happening? I've tried going through all the documentation and can't figure it out.

Can you check the driver logs for any errors?

I figured out the issue. I had to provide the appId and appSecret as options because my Synapse notebook was using the wrong ones I set in my Spark config.