[Question] DataRace Exception, possible bug
bgervan opened this issue · 3 comments
Hi,
Currently the writer checks for data race with:
if not is_delete then
-- check to make sure we don't have a data race condition
local updated = {}
for i, pair in ipairs(cjson.decode(ARGV[13])) do
local odata = redis.call('HGET', row_key, pair[1])
if odata ~= pair[2] then
table.insert(updated, pair[1])
end
end
if #updated > 0 then
return cjson.encode({race=updated})
end
end
But, if the use case, the above will thow exception (return race):
- Just write/update objects to the DB, without delete
- The only requirements to push the latest to the DB
- More service could send the very same object, in the same time
So if the second writer send an object, after the first one has already saved, it will return an error without reason.
Question:
Do I need this check in update only environments? Or what would be the good check in our case?
There can be cases when both writer stuck in a place and doesn't update after a point, even with force save.
Couldn't be a better way to use timestamp as identifier of which update is the newer and the above check would be useful only if the object were deleted?
Thank you for the answers.
Please see: #116
Does the force_update_data()
function cover your use-case? Should I add that to the library?
Thanks for the suggestion, but not exactly, I have force update in the code, and I still get this exception.
The force update you are referring looks not optimal in a low latency system, it will do more query in bad situations (correct me if I am wrong).
In a use case, when the user doesn't care about if another writer changed the object which is about to be updated, the above data race check will throw errors, without need.
Let me give you an example:
The event object for x event type:
- user_id (unique)
- timestamp (always updated by writer)
The usage:
- On event x:
- The system checks if the event x happened before the last y second
- If happened: do nothing
- If not (like a timeout): Save the new timestamp with the same id -- this could be done by multiple writer in the "same" time
This is very similar mechanism I am using with rom. There are other usages with I ran into the same data race exceptions.
The goal would be:
- Update if the timestamp is lower than the writer timestamp, which is a more general way to recognize data racing.
If we would have an option to skip
if not is_delete then
-- check to make sure we don't have a data race condition
and check with last updated timestamps instead the data race, that would help a lot.
I can try to put together a small example project, if it helps. The reason I just not deleting that check, that I may get some bad side effects in real data race conditions.
To sum up: "sometimes you don't care about what is there, and you just want to overwrite".
Indeed, in cases where you don't care about overwriting data (because you don't care, or because you know there isn't a race condition), updating your local entity repeatedly will do more round trips than necessary, and will induce more latency, especially for commonly accessed entities.
use a lock (what Josiah would do)
However, I'm not feeling great about adding a feature to disable the safety I added in order to prevent data and index corruption. If you wanted to ensure that this never happens, you could use an explicit lock. Lock around the object, fetch/refresh, update local object, write, unlock. That guarantees that your round trips / delays are all up front in the lock, not in the write / update. It also gives you the ability to "deadline" the changes, so if it takes longer than a certain period of time, bail out. That makes sure that everything that I care about WRT data integrity is upheld. That's what I would recommend.
with rom.util.EntityLock(entity, acquire_timeout, lock_timeout) as lock:
entity.refresh()
for attr, value in data:
setattr(entity, attr, value)
entity.save()
With a little effort, you can turn the above 5x IOs (get + lock acquire + refresh + save + release) into 4 (lock acquire, get, save, release).
OR ignore most of the internals (you can do this, but you might break stuff)
Alternatively, if you're not indexing this data at all, and don't care about data integrity (seems like it), you can always do the following on some encoded data:
def unsafe_update_anything(model, id, dct):
entity = model.get(id)
entity._connection.hmset(entity._pkey, dct)
# entity is now updated, took 2 round trips
# with effort can be turned into just the hmset call for 1 round trip
This does 2 round trips. With a bit more effort, you can get that primary key in advance for anything, and call the model's ._connection.hmset(pkey, dct)
directly for 1 round trip.
So, 1 - 5 round trips, depending on safety level / perf optimizations. Does that work?