Allow specification of exact schema in conf
saswata-dutta opened this issue · 2 comments
Maybe read DDL or schema json for formats like json and csv, and avoid wrong inference.
DataType.fromJson(schema_json_str)
or
DataType.fromDDL(schema_DDL_str)
Then, spark.read.schema(schema)...
NB. what happens to rows which doesnt conform to schema, for csv,json consider columnNameOfCorruptRecord.
For Mongo-spark no such option exists, so use explicit case class and maybe clean-frames,
but how to specify class name of schema.
https://stackoverflow.com/questions/23785439/getting-typetag-from-a-classname-string
For Mongo-spark no such option exists, so use explicit case class and maybe clean-frames,
but how to specify class name of schema.https://stackoverflow.com/questions/23785439/getting-typetag-from-a-classname-string
Use the MongoSpark builder to specify sparksession, readconfig, and the schema as arg to the toDF terminator.