cosmoss-jigu/mv-rlu

Copy object pointer points to write set

Opened this issue · 8 comments

Hi,

I am using mv-rlu and encountered this weird bug, essentially the latest copy of an object I am trying to deref points to a write set in the log. Could you provide any ideas on what could be the issue?

I casted the copy object pointer to mvrlu_wrt_set_t * and verified that it's a valid write set since its wrt_clk is the same as the copy objs following it and its num_objs equals the number of copy objs following it. Furthermore, this write set seems to be irrelevant to the object I am trying to deref since all the copy objects in the write set have master object pointer different than it.

I guess one possible reason could be that the copy of this object is already reclaimed but somehow the pointer to the latest copy is not set to NULL during reclamation and thus points to new stuff written in the log that is irrelevant to the object itself. But the reclaimation code looks correct to me and I really have no idea now.

I made some changes to the mv-rlu code to allow nested locking and calling mvrlu_deref() after mvrlu_try_lock(), and this might lead to the issue? I need these since the critical section spans across multiple functions. We are creating a data structure lib that allow users to write serializable transactions that invoke multiple data structure operations. Users can invoke the data structure operations in whatever way they want so the library needs to consider the case where the same object is locked multiple times or mvrlu_deref() is called on a locked object. These changes can be seen here: master...dslab-epfl:fnso

My setup:

  • Intel(R) Xeon(R) Gold 6248R CPU (Our app used only 8 cores from the first socket)
  • Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)
  • mvrlu built with make libmvrlu-ordo.a CC=gcc-8, ordo value is set

Cheers,
Lei

@madhavakrishnan If you have time, can you take a look at this.

Hi Lei,
I quickly reviewed your mvrlu_deref changes and I think the potential is issue lies in the modified logic. Let me explain why.
When you allow nested try_lock mvrlu does not create a new copy object (in its tvlog) instead it just reuses the previously allocated copy_obj (the one that is allocated during the first try_lock). That being said when you try to deref the master_obj that is currently locked the original mv-rlu code will deref the latest committed copy (not the uncommitted copy_obj present in the calling thread's or other thread's write set). But the changes you made to the deref logic will deref the uncommitted copy_obj and uncommitted copy_obj is present only in the write_set (not in the version chain). Note, only the committed copy_obj is moved to the version chain. So I guess it makes sense that your dereferenced copy is pointing to the write_Set in the log. There are two possible bug scenarios I could think of,

(1) you intend to deref the latest copy_obj (committed i.e., present in the version chain) of a particular master_obj but due to your mvrlu_deref logic modification, it dereferences the uncommitted copy_object in the write_set. This is possible because your current deref logic does not traverse the version chain if the master_obj that is being dereferenced is locked by the calling thread. So the copy_obj ptr that you get is pointing to the write_set. I guess you can verify this by checking the wrt_clk of the write_set. If the write_clk == MAX_VERSION it is most likely the above scenario.
(2) you try to deref the uncommitted copy_obj and the transaction has aborted but you still have a pointer to that object in your application. When the next transaction starts new copy_obj is written to the same tvlog space (i.e., previous copy_obj). That is why your copy_obj is not part of the write_set as the previous write_set is overwritten by the new transaction.

That being said I believe that dereferencing an uncommitted copy_obj is a violation of transactions semantics even if it is done by the same thread that holds the lock. To answer your last question, when you try_lock multiple times on the same object it is never an issue as try_lock will simply return the first copy_obj created. Multiple deref calls are also okay as long as you just need the latest committed copy_object but if you are application needs access to the uncommitted copy_obj I guess it has to be handled on the application side.

I hope this information is useful, please let me know if you have any other questions. Happy to answer!

Best regards,
Madhav

Hi Madhav, @madhavakrishnan

Thanks a lot for the answer.

In the bug scenario, the mvrlu_deref() invocation that causes the issue is done before any mvrlu_try_lock() on the same object in the same mvrlu critical section. It intends to read the latest committed (before the critical section starts) version of the object before the same thread locks the object. So this bug scenario does not match any of the scenarios you mentioned.

Furthermore, even in the case of the first scenario you mentioned, the copy_obj ptr should not point to the write set header itself (type mvrlu_wrt_set_t) but rather the uncommitted copy_obj in the write set. I think I might not make it clear in the issue description, by "the latest copy of an object points to a write set in the log" I meant "the lastest copy of an object points to a write set header (type mvrlu_wrt_set_t)". Type mvrlu_wrt_set_t is defined here https://github.com/cosmoss-vt/mv-rlu/blob/84ab18acce44c3d641380631550386d8a9225739/lib/mvrlu_i.h#L82

Do you have any other ideas on why this issue could happen in such a scenario?

Regarding the last part of your answer, I don't see why reading an uncommitted object violates transaction semantics because the read is happening within the same transaction as the writes that modify the object.

Cheers,
Lei

Hello Lei,
When you say " It intends to read the latest committed (before the critical section starts)" does it mean you are trying to deref an object outside mvrlu critical section? mvrlu critical section starts when you call mvrlu_reader_lock, I just want to make sure your application calls mvrlu_deref within mvrlu_reader_lock and mvrlu_reader_unlock boundary. If this is not the case then I suggest you first fix it.

But assuming you call mvrlu_deref correctly, then I get one more question, do you know if the master_obj you are trying to deref is try_locked by some other writer? If yes then it might cause a potential race condition. For instance, when thread 1 (reader thread) tries to deref the master object it will read the p_copy to start the version chain traversal. Maybe a writer concurrently updates p_copy when committing its copy_obj and unfortunately reader is seeing the old p_copy and not the updated one. To confirm this I'd suggest you debug statements or place some mvrlu_assert at the right places. Also, you could try issuing a memory barrier (e.g., sfence/mfence) after p_copy is updated in the ws_unlock function.

To me, this bug looks intricate and usually I go about positing many possible scenarios and eliminate each of the scenarios by writing some specific unittest cases or placing assert/debug statements to trace and get more information. This is how I fixed many such bugs in my other MVCC project.

For me to give more information or suggest more detailed scenarios I may get hands-on with the code to understand the root cause of this bug. if you can give more debug info and trace info I can try to be more concrete.
Let me know if you have any further questions.

@maximilian1064

Best regards,
Madhav

@madhavakrishnan @sanidhya

Hi Madhav,

Sorry I have been away from this issue for some time. Yes, I confirm that I use rlu_deref correctly.

I spent some time digging further, a couple of new findings:

  • In rlu_deref(), I tried to assert if the copy object returned belongs to the object deref'ed. This fails 1 in around 20 executions of my application, which should not happen. See here: https://github.com/dslab-epfl/mv-rlu/blob/5a499ee79eb58120296b7d42127d44619dd4e28f/lib/mvrlu.c#L1570 for the assert. When the assert failed, the copy objects returned were mostly copies of other objects and sometimes a write set block (the initial bug case I mentioned in this thread). Also, in my scenario all objects are allocated at initialization and never freed during execution.

  • The bug is likely related to my change to mv-rlu (master...dslab-epfl:fnso). I rewrote the data structure/application with the vanilla mv-rlu and the bug disappears (this is a temporary workaround and we need our mv-rlu change to work).

Here is the code that triggered the bug: https://drive.google.com/file/d/1rlwsXYPtY3JDvR_updrwPVllSG-eQhky/view?usp=sharing

Could you give any other ideas on the bug based on these findings and the code? Also, Could you share with me a more recent mv-rlu version if there is one? Maybe the bug is already fixed in the newer version?

Many thanks,
Lei

@madhavakrishnan @sanidhya

Hi folks,

I added a microbenchmark I used for debugging this issue. I didn't succeed in triggering the issue with the microbenchmark, but maybe you can tune and modify it to trigger it and help debugging. It's here https://github.com/maximilian1064/mvrlu-dchain-debug (see its README for usage)

Cheers,
Lei

@madhavakrishnan @sanidhya

Hi folks,

I modified the microbenchmark in my last comment and it can trigger this bug now (1 in ~3 executions, using 8 cores), the key is to have some delay (5 secs) between de-/allocation of the same index. Please pull the changes.

Cheers,
Lei