Trying to re-create a datawarehouse solution using Spark
Stage 1 The input file is compared with existing file(snapshot) and records are updated / inserted according to SCD 2
Stage 2 Save the output of Stage 1 with a schema as avro
//To do Stage 3 Save the output of Stage 2 as Parquet