Allow specification of exact schema in conf

Question

Allow specification of exact schema in conf

saswata-dutta opened this issue 5 years ago · 2 comments

Maybe read DDL or schema json for formats like json and csv, and avoid wrong inference.

DataType.fromJson(schema_json_str)
or
DataType.fromDDL(schema_DDL_str)

Then, spark.read.schema(schema)...

NB. what happens to rows which doesnt conform to schema, for csv,json consider columnNameOfCorruptRecord.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L650

Answer 1 · 2019-08-05T13:04:39.000Z

For Mongo-spark no such option exists, so use explicit case class and maybe clean-frames,
but how to specify class name of schema.

https://stackoverflow.com/questions/23785439/getting-typetag-from-a-classname-string

https://github.com/funkyminds/cleanframe

Answer 2 · 2019-08-19T15:12:11.000Z

For Mongo-spark no such option exists, so use explicit case class and maybe clean-frames,
but how to specify class name of schema.

https://stackoverflow.com/questions/23785439/getting-typetag-from-a-classname-string

https://github.com/funkyminds/cleanframe

Use the MongoSpark builder to specify sparksession, readconfig, and the schema as arg to the toDF terminator.