kaist-cp/cs431

[Question] (Memory leak in optimistic list set)

Closed this issue · 8 comments

I am implementing the optimistic list set, currently all the tests can exit and pass successfully (cargo and cargo_tsan), but cargo_asan report some memory leak in the insert operation in stress_concurrent test. I think the problem maybe the insert operation, here is how I do insert, suppose the list is 1 -> 3 -> 5 -> 7 -> 9, and I want to insert 6:

  • traverse and move to node 7: acquire the read lock of node5.next and read node7
  • validate the read of node7, if fail loop to step1
  • try upgrade the read lock, if success insert node6 between node5 and node7, return true; if fail go to step1

And below is how I do remove:

  • use find move the cursor to the target node
  • try upgrade the cursor.prev read lock to write lock, if fail return false
  • read the node cursor pointing to, and acquire the write lock of the next lock, and do the remove
  • use defer_destroy to drop the target node

If I ignore the delete ops in the test, there is no memory leak, so I guess there must be some races between insert and remove.

Below is the log of caogo_tsan in stress_concurrent test:

running 1 test
test optimistic_fine_grained::stress_concurrent ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 12 filtered out; finished in 0.21s


=================================================================
==96499==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 384 byte(s) in 16 object(s) allocated from:
    #0 0x59687b2f44ef in malloc /rustc/llvm/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
    #1 0x59687b592aac in __rdl_alloc (/home/ctr/arch/cs431/cs431-2024/homework/target/x86_64-unknown-linux-gnu/release/deps/list_set-b764e0293f1f42b6+0x4b9aac) (BuildId: 67fb54466bce656cb505d59e41023c6e02c81c36)
    #2 0x59687b33e27d in cs431_homework::test::adt::set::_$LT$impl$u20$cs431_homework..adt..ConcurrentMap$LT$T$C$$LP$$RP$$GT$$u20$for$u20$S$GT$::insert::h866d198a553dfc6f (/home/ctr/arch/cs431/cs431-2024/homework/target/x86_64-unknown-linux-gnu/release/deps/list_set-b764e0293f1f42b6+0x26527d) (BuildId: 67fb54466bce656cb505d59e41023c6e02c81c36)
    #3 0x59687b35cf44 in std::sys_common::backtrace::__rust_begin_short_backtrace::h95606d96aeb813eb (/home/ctr/arch/cs431/cs431-2024/homework/target/x86_64-unknown-linux-gnu/release/deps/list_set-b764e0293f1f42b6+0x283f44) (BuildId: 67fb54466bce656cb505d59e41023c6e02c81c36)
    #4 0x59687b350ca3 in __rust_try.llvm.14851880513301806781 (/home/ctr/arch/cs431/cs431-2024/homework/target/x86_64-unknown-linux-gnu/release/deps/list_set-b764e0293f1f42b6+0x277ca3) (BuildId: 67fb54466bce656cb505d59e41023c6e02c81c36)

SUMMARY: AddressSanitizer: 384 byte(s) leaked in 16 allocation(s).
error: test failed, to rerun pass `--test list_set`

I've already read the relevant issues about list set and I'd appreciate if you can give any advice on where the bug may happen, I already spent a whole day debugging this(cry..), thank you!

What you might be doing is that in insert, you are

  1. Creating a Owned<Node<T>> with Node::new.
  2. Changing it to a Shared<_> via Owned::into_shared.
  3. Forgetting to cleanup the Shared<_> from 2 in the case insert fails.

Thanks, but actually I create the new node after I successfully acquired the write lock, so in my code the insert will not fail if I create a new node.

Hmm, then I'm not so sure. What might be happening is that in the presence of delete, your insert is actually inserting between already deleted nodes, hence causing it to be dangling.

yes, I tried to reason about such "lost insert" situation while 2 threads are deleting and inserting at the same position, but I think they should be ordered correctly with my insert and remove logic. Also, I rewrote the insert function in the way that only acquire write lock hand over hand, in this way the "stress_concurrent" test can pass without memory leak. Do you see anything weird in my insert and remove procedures? Thank you for your advice.

'stress_concurrnet' can't really detect such "missing inserts". Same for' log_concurrnt', so you should probably look over your code regardless.

Thank you, finally I solved the problem in a very inelegant way. So what I do is basically make every read operation in a sandwich-like way: validate the read lock right before reading the data and validate again after finish using the data. I wonder is this the idiomatic way to use the seqlock or there are some unnecessary validate operation here.

Huh, needing to validate a read lock before reading the associated data seems strange. The traversal stage is very similar to that of the standard find-grained linked-list, just that you need to validate a read lock before releasing it.

Thank you for your help, I'll try to refactor the code. I think this issue can be closed.