Azure/spark-cdm-connector

NoSuchTableException: Manifest doesn't exist. root.manifest.cdm.json #Spark3.2.1 jar #Databricks

Closed this issue · 2 comments

Hi @sricheta92 @kecheung ,

I'm facing an issue while running the below mention code to create file on adls using Spark Connector 3.2.1.
Error message : NoSuchTableException: Manifest doesn't exist. root.manifest.cdm.json

image

We are trying to run sample code shared by you. We are using below mention config for Databricks.
Runtime version : 10.4 LTS
Library : spark-cdm-connector-assembly-synapse-spark3.2-1.19.4.jar

Could you please help us ?
Please let me know if more information is required.

Thanks in advance.

Hi @sricheta92 @kecheung ,

i am also facing same issue when using following sample code with spark-cdm-connector-assembly-synapse-spark3.2-1.19.4.jar

// COMMAND ----------

// Explicit write, creating an entity in a CDM folder based on a pre-defined model

// Case 2: Using an entity definition defined in a CDM model stored in ADLS

// UPLOAD CDM FILES FIRST
// To run this example, first create a /Models/Contacts folder to your demo container in ADLS gen2,
// then upload the provided Contacts.manifest.cdm.json, Person.cdm.json, Entity.cdm.json files

val birthdate= java.sql.Date.valueOf("1991-03-31");
val now = new java.sql.Timestamp(System.currentTimeMillis());
val data2 = Seq(
  Row(1,now,"Donna","Carreras",birthdate),
  Row(2,now,"Keith","Harris",birthdate),
  Row(2,now,"Carla","McGee",birthdate)
)

val schema2 = new StructType()
  .add(StructField("identifier", IntegerType))
  .add(StructField("createdTime", TimestampType))
  .add(StructField("firstName", StringType))
  .add(StructField("lastName", StringType))
  .add(StructField("birthDate", DateType))

// Create the dataframe that matches the CDM definition of the entity, Person
val df2 = spark.createDataFrame(spark.sparkContext.parallelize(data2, 1), schema2)
df2.write.format("com.microsoft.cdm")
  .option("storage", storageAccountName)
  .option("manifestPath", container + "/Data/Contacts/root.manifest.cdm.json")
  .option("entity", "Person")
  .option("entityDefinitionModelRoot", container + "/Models")
  .option("entityDefinitionPath", "/Contacts/Person.cdm.json/Person")
  .mode(SaveMode.Overwrite)
  .save()
  • do we have to add something for spark 3 cdm connector lib.?. as same code is working fine with spark 2 cdm connector.
  • do we have to create root.manifest.cdm.json manually ?

please suggest some workaround as i have to change the spark version from 2 to 3.
Thanks in advance!

@RushikeshGuajr @dlpkmr98 The sample might be out of date. There is a behavior change in the save behavior if you are moving from Spark 2 to Spark 3. Please remove the .mode(SaveMode.Overwrite) option . You are writing the entity for the first time and because it never existed, you will get the error. I have tried the sample and it works for me with that change.

Added save mode changes to the common issues #118

It's mentioned briefly here and the default behavior is SaveMode.ErrorIfExists. https://github.com/Azure/spark-cdm-connector/blob/spark3.2/documentation/overview.md#save-mode