aosabook/500lines

Transaction bug in functional db, the 'ts' field will be incorrect if execute more than one operations in a single transaction?

Opened this issue · 4 comments

mlmhl commented

@yoavrubin

The functional db project is really cool, and I learned a lot from it. After reading it, I have a doubt about the Transaction section. I'm not familiar with Clojure, if I have a wrong understanding, please forgive me, thank you very much.

In the transact-on-db function, each operations will be executed on the initial-db and finally generated an updated database named transacted, and then the newest layer in transacted will be appended to the initial-db as the newest layer. I think there is a bug in the last process if more than one operations need to be executed in an transaction. Consider the following example:

  1. The initial-db has two layers, the first layer is empty(ts is 0), and the second one has only one entity e with an attribute named a, and a's value is v1, ts is 1.
  2. Execute two update operations in a transaction: the first one will update a to v2 and the second one will update a to v3. The first operation will generate a new layer, in which a's value is v2 and ts is 2, prev-ts is 1. The second operation will generate a new layer too, in which a's value is v3, ts is 3, prev-ts is 2.
  3. After above two operations, transact-on-db will append the layer generated by second operation to initial-db's layers, now the initial-db has three layers, the first one's ts is 0, the second one's ts is 1, but the third one's ts is 3, and a's prev-ts is 2 in the third layer. At this time, if we invoke the evolution-of function on initial-db, we will fall into an infinite loop.

The above example is just one possible error, if we execute more than two update operations in a single transaction and then invoke evolution-of, we will fall into a 'layer not found' error.

@mlmhl
Thanks for reading and I am glad you enjoyed it!

Let's start with understanding the semantics of "execute more than two update operations in a single transaction".
A transaction is defined and designed to be a set of operations occurring all together in a single point in time. Two (or more) updates on the same attribute cannot happen at the same time, and allowing that to happen contradicts the definition of transaction, therefore ends up in an undefined state.
The right way to handle such an issue is to add a thorough user's input validation.
Having said that, in our case, the main implementation concerns revolved around demonstrating the ideas behind functional database in a correct, readable and concise manner.
Thorough user input verification, alongside other production-related concerns, such as performance or security, were left aside by design.
In scenario you described we'd fall into an infinite loop when calling evolution-of, which reflects the undefined state that the improper transaction call created (still, if you'd look at datom, they would reflect the intended user call's semantics).
I'd like also to discuss another point in your scenario, which is the timestamps assigned to the datom in the transaction you've defined. The update (a,v2) will have ts of 2 and prev-ts 1, the update (a,v3) will have ts of 2 and prev-ts 2, and the topmost layer at the database would have ts of 2. There would not be a timestamp of 3 anywhere. This is due to the fact that the update to a database's timestamp happens only at the end of transact-on-db, and the timestamp of every intermediate layer created in this function is the timestamp of initial-db +1.

@yoavrubin The build.py script in this project is broken, do you offer an online pdf version of your functional DB, or sth else containing your work?

@Shuumatsu
I'm not aware of an online PDF version.
The chapter itself can be found here.
The complete code can be found here.

@yoavrubin Thank you