k3s-io/kine

question: why kine do not use db transaction when it implement TXN interface

DFSOrange opened this issue · 9 comments

question: why kine do not use db transaction when it implement TXN interface

I'm not sure what you mean; does use database transactions when necessary, during compaction? Other operations such as inserts and selects are done in a single query to begin with and wouldn't benefit from being run in a transaction.

I'm not sure what you mean; does use database transactions when necessary, during compaction? Other operations such as inserts and selects are done in a single query to begin with and wouldn't benefit from being run in a transaction.

sorry, i didn't make it clear.for example,in txn interface.the Logstructured implement the Update operation several steps

  1. get the event in the db.
  2. check the event revision whether equal to the revision from the TXN IF(xxxx)
  3. append a new event.

these steps are not atmoic operations, will it influence the kubernetes?

No, we specifically don't want to do that in a transaction. It needs to fail if another write to that resource has happened since the client last retrieved it; that's how the "resource has been modified" error makes its way back to the Kubernetes API client.

No, we specifically don't want to do that in a transaction. It needs to fail if another write to that resource has happened since the client last retrieved it; that's how the "resource has been modified" error makes its way back to the Kubernetes API client.

hi,sorry for long time no reply,you can understand what I mean by the following examples:

A-thread want to del key1
B-thread want to update key1

in kine these things should be done by the following order

A-get-->A-append-->B-get-->B-append

but it possible happend

A-get -->B-get-->B-append-->A--append

it is not a cas result

That is not how MVCC databases work. When you submit an update to an existing object, you have to include the revision you are modifying. If another write has occurred since then, the revision will no longer be current and your update will fail.

There are no sql deletes in the kine model, other than when old revisions are compacted all operations are inserts that guarantee atomicity by use of the auto increment id column that functions as a revision counter.

That is not how MVCC databases work. When you submit an update to an existing object, you have to include the revision you are modifying. If another write has occurred since then, the revision will no longer be current and your update will fail.

There are no sql deletes in the kine model, other than when old revisions are compacted all operations are inserts that guarantee atomicity by use of the auto increment id column that functions as a revision counter.

it is possible the two threads both paas the revision check and append success?

image

I do not see the updateEvent include the newest revision

That is not how MVCC databases work. When you submit an update to an existing object, you have to include the revision you are modifying. If another write has occurred since then, the revision will no longer be current and your update will fail.
There are no sql deletes in the kine model, other than when old revisions are compacted all operations are inserts that guarantee atomicity by use of the auto increment id column that functions as a revision counter.

it is possible the two threads both paas the revision check and append success?

image

I do not see the updateEvent include the newest revision

I mean should we add a table-lock before the get and unlock it after the append?

sorry,I found mysql have create a unique pre_revision index,you mean it?

Yes. The update will fail if the revision in the update request isn't the same as the current revision of the resource (event.KV.ModRevision != revision, where revision is the latest revision of the resource as just retrieved from the database).

If multiple writers attempt to update the resource at the same time using a valid current revision, the second insert will fail due to the unique kine_name_prev_revision_uindex index, which ensures that you can't have two resources with the same key and prev_revision.

So, you can't update the object unless you have the latest revision of it, and you can't make multiple updates to the same revision. No transaction needed.