Error when trying to write pyspark dataframe to DynamoDB
Opened this issue · 1 comments
jcerquozzi commented
Hi,
I am trying to write a pyspark dataframe (that comes from a parquet file) to DynamoDB, but I am getting the following error:
AnalysisException: TableProvider implementation dynamodb cannot be written with ErrorIfExists mode, please use Append or Overwrite modes instead.;
The code I am using is:
df = sqlContext.read.parquet(path)
df.write.option("tableName", "dynamo_test") \
.format("dynamodb") \
.save()
I tried putting
df.write.option("tableName", "dynamo_test") \
.format("dynamodb").mode("overwrite") \
.save()
And got error:
AnalysisException: Table dynamo_test does not support truncate in batch mode.;;
rehevkor5 commented
I believe Append is the appropriate choice, try adding:
.mode(SaveMode.Append)
The example in the README is bad, for this. See also the method DynamoDBDataFrameWriter#dynamodb(tableName: String)
in implicits.scala
. You can see that it specifies SaveMode.Append
.