audienceproject/spark-dynamodb

How do I serialize a dynamoDB column of string set datatype?

Opened this issue · 1 comments

Hi guys, thanks for creating this project, it has been of great help to me and I have enjoyed using it so far.

I have a column in my table that is of string set datatype, and it is currently being inferred as an Array[String] which gets persisted as list of string when being written back to dynamoDB. I have tried coercing it toSet[String] but it is still being written back to dynamoDB as list of string. What datatype should I coerce it to in order to write the column as a string set?

Expected

  "names": {
    "SS": [
      "dummy-name"
    ]
  }

Actual

  "names": {
    "L": [
      {
        "S": "dummy-name"
      },
    ]
  }

Hello!
Thank you for using our library.
The problem with this issue is that Spark does not have a Set type - the best option is to read it as an array. The problem is that now we forget that it used to be a Set, and when writing it will become a List (due to the array->List conversion).

I can imagine a few solutions:

  1. Maintain some kind of metadata in Spark about the field's origin type in Dynamo, and use this when writing back into Dynamo
  2. Add an option to write arrays as Set instead of List, perhaps on a per-column basis

I would prefer solution 1. We will consider building it if we have time. PRs are welcome :)