googleapis/python-spanner

database: run_in_transaction with DQL (Data Query Language) statements seems impossible to use and produces "Transaction is not begun" and then "Transaction is already committed"

odeke-em opened this issue · 2 comments

While working on battle testing for observability to inspect expected spans, I wrote this code

def tx_dql_select(tx):
    return tx.execute_sql("SELECT * FROM Singers")

fn = tx_dql_select
try:
     result = database.run_in_transaction(fn)
     if result and hasattr(result, "__iter__"):
            for res in result:
                  print(res)
except Exception as e:
     print(fn.__name__, e)

which produces

tx_dql_select Transaction is not begun

but when I add tx.begin() before the return I get back

tx_dql_select Transaction is already committed

It seems impossible to use, we need to figure out what's up. I should be able to run purely DQL in a transaction.

Kindly cc-ing @sakthivelmanii @harshachinta

This is expected behavior because

  1. txn.executeSql returns a StreamedResultSet, and txn.execute_sql doesn’t actually invoke an RPC until the StreamedResultSet is iterated. This approach reduces network overhead by avoiding an unnecessary RPC call in cases where the result set is never iterated
  2. The results of the read/queries can only be iterated with in the transaction scope because the return type of read/query is a stream rather than the original result. In your example, the iteration happens out of transaction scope.

So you will have to iterate the result set with in the transaction scope like

def tx_dql_select(tx):
  result = tx.execute_sql("SELECT * FROM Singers")
  for row in result:
      print(row)
      
      
database.run_in_transaction(tx_dql_select)

Thank you @harshachinta! In other languages where the results depend on event emitters, it is more natural to return the result and consume it as one needs. Currently I don't see documentation that recommends consumption and usage as you've indicated, perhaps let's work on that?