audienceproject/spark-dynamodb

DynamoDB error: batch write request items is empty

Closed this issue · 5 comments

Hi folks, I'm trying to do a fairly simple write to DynamoDB but it seems the batchWrite request contains no items. Here's the error from the live AWS API:

Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 validation error detected: Value '{my-table=[]}' at 'requestItems' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 25, Member must have length greater than or equal to 1] (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: NP921O0BVIO5OEMBTANMMJ07Q3VV4KQNSO5AEMVJF66Q9ASUAAJG)

Here's my code:

val countsDS = countsStep.run(communitiesDS)
val artists = countsDS
  .as[FeedTopItem]
  .filter(item => item._type == "artistCounts")
  .map(item => ArtistCount(s"${item.feed}:${item.key}", dateWithHour, item.value))
  .as[ArtistCount]
artists.show()
+--------------------+------------+-----+
|      feedWithArtist|dateWithHour|count|
+--------------------+------------+-----+
| x:Shanti Celeste|  2019120818|    1|
|         x:Khotin|  2019120818|    1|
|      x:Chillinit|  2019120818|    1|
|       x:Good Gas|  2019120818|    1|
|      y:Yo Trane|  2019120818|    1|
|          y:dvsn|  2019120818|    1|
|          y:Belly|  2019120818|    1|
|           y:NAV|  2019120818|    1|
+--------------------+------------+-----+
artists.write.dynamodb("mtrAggregations")

I'm producing a regular Dataset[MyCaseClass] and saving it, nothing unusual. I've also tried

artists.toDF("feedWithArtist", "dateWithHour", "count").write.dynamodb("...")

but that also doesn't work.

I'm running Spark 2.4.4, Scala 2.11, and "com.audienceproject" %% "spark-dynamodb" % "1.0.0"

Hi Rory,
We'll take a look and see if we can replicate the issue.

Great, thanks. For context, the data is loaded from S3 JSON files (not from DynamoDB). I'm only trying to write to DynamoDB.

Also here's the case class type of the Dataset for parity: case class ArtistCount(feedWithArtist: String, dateWithHour: Long, count: Long)

@cosmincatalin I realized that it's most likely due to empty partitions. The non-empty partitions are being written to DynamoDB, but the empty ones are throwing this error.

Calling .repartition() before .dynamodb("...") solves the issue, but does incur a shuffle.

Hi Rory. Thank you for reporting this. We will fix it with the next release (1.0.1)

Version 1.0.1 is now on Maven Central.