Incomplete schema inference while reading from DynamoDB table

Question

Incomplete schema inference while reading from DynamoDB table

Opened this issue 4 years ago · 3 comments

DynamoDB Table:

I am reading the above table using the following code:

spark.read
        .option("tableName", config.tableName)
        .option("region", config.ddbConfig.region)
        .format("dynamodb")
        .load()
df.show()

Result:
|s_id| created_on|p_id|
+----+-------------------+----+
| 002|2018-11-20 12:01:19| 2|
| 001|2018-11-19 12:01:19| 1|
| 006|2018-11-20 12:01:19| 6|
| 005|2018-11-19 12:01:20| 5|
| 004|2018-12-19 12:01:19| 4|
| 003|2019-11-19 12:01:19| 3|

The "num" column was missing from the df. Why did this happen? Is there any flag which I need to set to ensure complete schema inference?

Answer 1 · 2021-02-08T13:14:10.000Z

You can pass userSchema option and your schema along with that otherwise it creates schema from the data on first page of dynamodb table.

Answer 2 · 2021-02-22T17:28:39.000Z

thanks! this helps.

This library returned an empty dataframe when I tried to read a DDB table with both range key and hash key. Is this a known behaviour?

Answer 3 · 2021-03-26T12:56:54.000Z

@siah210 you should pass schema with .schema() parameter just like you do to normal DF that would work.