database.batch does not retry aborted transactions
olavloite opened this issue · 4 comments
The standard example for writing data with mutations uses database.batch
: https://cloud.google.com/spanner/docs/getting-started/python#write-data-with-mutations
database.batch
however does not automatically retry the transaction if it is aborted by Spanner. This causes errors if you try to use this method to insert a large amount of data, or if there are lock contentions on the data that you insert.
Either:
- The sample(s) should be updated to show how to use
run_in_transaction
to use mutations. - And/or:
database.batch
should also automatically retry aborted transactions.
After reviewing the database.batch API, we found that commit calls within a batch are automatically retried, with a default retry count of 5. You can refer to this line in the code for more details: link. The batch.insert API simply appends values to the mutations list, and these values are persisted in the database when the commit API is called on the Spanner client. This behavior is consistent across all Batch APIs defined here: link.
It's important to note that retries are only triggered in the case of an InternalServerError exception. For more information, refer to: link. However, I don't believe these retries are triggered if an Aborted exception occurs during a transaction execution.
Yeah, this bug is specifically for Aborted
errors. If the transaction is aborted by Spanner (meaning: Spanner returns an error with error code Aborted
), then the transaction should be retried. The retry mechanism should be the same as for run_in_transaction
; It should do a back-off and retry, using the back-off value that is included in the Aborted
error. There should not be a maximum number of retries, instead it should stop retrying if the deadline has been exceeded.
I understand. The run_in_transaction
method currently evaluates the deadline value using the keyword arguments, as defined here: link. In my opinion, it's the client's responsibility to pass the timeout (in seconds) as part of the keyword arguments for the transaction. For our use case, how should we handle the evaluation of the deadline value? Should we define a default timeout for all mutation operations and use that to set the deadline, or is there another approach we should consider.