SETL-Framework/setl

Delta - Read older versions using `versionAsOf`

JorisTruong opened this issue ยท 2 comments

Describe the bug

I was looking to reproduce the piece of code from the Delta Quickstart here:

df = spark.read.format("delta").option("versionAsOf", 0).load("/tmp/delta-table")
df.show()

in SETL by using a Connector and a configuration object like this:

deltaDataVersionZero {
  storage = "DELTA"
  path = <path>
  versionAsOf = "0"
  mode = "Overwrite"
}

However, the version 0 did not show up when I tried to read the DataFrame.

I think this is due to the reader in DeltaConnector. There are no options set and this is why the versionAsOf key is not taken into account.

To Reproduce

We can reproduce this bug using DeltaConnectorSuite in the test named "test Delta connector update" line 82.

Adding this piece of code at the end, before deltaConnector.drop():

val deltaConnectorOld = new DeltaConnector(DeltaConnectorConf.fromMap(
  Map[String, String](
    "path" -> path,
    "saveMode" -> saveMode.toString,
    "versionAsOf" -> "0"
  )
))

deltaConnectorOld.read().show(false)

We can see that the printed output is the updated data, while it should have been the data from testTable.

Expected behavior

The reader in DeltaConnector should set the options.

thanks for the pr :D

๐Ÿ‘