newfront/hitchhikers_guide_to_deltalake_streaming

Add Ignorance Is Bliss Section for Streaming 'ignoreX'...

Opened this issue · 0 comments

moving fast is awesome. crashing at lightspeed is not.

  • preventing problems that have to do with "not thinking it though"
  • there is a tie in to "ooops I did it again here". Since mergeSchema: true only prevents type changes - while `overwriteSchema
(spark.read.table(...)
  .withColumnRenamed("x", "y")
  .write
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable(...)
)

not a terrible problem - just a new name - same content

(spark.read.table(...)
.withColumn('x', date(col('x'))
.write
.mode('overwrite')
.mode('overwriteSchema', 'true')
.saveAsTable(...)
)

this is bad. this is very bad. We not only just lost precision, but we'll have type conflicts for our streaming applications (active downstream)....