audienceproject/spark-dynamodb

Error when trying to write pyspark dataframe to DynamoDB

Opened this issue · 1 comments

Hi,

I am trying to write a pyspark dataframe (that comes from a parquet file) to DynamoDB, but I am getting the following error:

AnalysisException: TableProvider implementation dynamodb cannot be written with ErrorIfExists mode, please use Append or Overwrite modes instead.;

The code I am using is:

df = sqlContext.read.parquet(path)

df.write.option("tableName", "dynamo_test") \
            .format("dynamodb") \
            .save()

I tried putting

df.write.option("tableName", "dynamo_test") \
                .format("dynamodb").mode("overwrite") \
                .save()

And got error:

AnalysisException: Table dynamo_test does not support truncate in batch mode.;;

I believe Append is the appropriate choice, try adding:

.mode(SaveMode.Append)

The example in the README is bad, for this. See also the method DynamoDBDataFrameWriter#dynamodb(tableName: String) in implicits.scala. You can see that it specifies SaveMode.Append.