Comparison with "Concurrent Reference Counting and Resource Management in Constant Time"

Question

Comparison with "Concurrent Reference Counting and Resource Management in Constant Time"

jrmuizel opened this issue 4 years ago · 5 comments

It would be interesting to know how arc-swap compares to the approach in https://arxiv.org/pdf/2002.07053.pdf

Answer 1 · 2020-07-08T06:32:04.000Z

I'll try to find some time to read it, but it's a bit long and heavy, so I can't promise when it would happen. Feel free to have a look first. There's a long comment in the code explaining the inner workings of arc-swap.

One thing that I've noticed when skimming, they claim lock-free reads, while arc-swap claims wait-free ones (unless it's the first read on the given thread, in which case it is merely lock-free). But that was just by scrolling through the PDF 🤷

Answer 2 · 2020-07-08T06:48:03.000Z

There was a blog post about it just yesterday: https://pvk.ca/Blog/2020/07/07/flatter-wait-free-hazard-pointers/

Answer 3 · 2020-07-17T19:27:18.000Z

I've had the time to go through the blog post. It's not exactly easy reading and I'll need to go through it few more times. So these are just few impressions:

Both the blog post and arc-swap is wait-free on readers. Arc-swap is not lock-free on writers, the thing described there is.
Arc-swap has a SeqCst atomic CAS on the readers, the blog post uses only acquire/release on the hot path, so it should be faster.
However, in practice I'd expect arc-swap's writers to be much faster, because the membarier syscall must wait for all other cores to do a context switch ‒ which can take up to one quantum of the scheduler. That may be something like 1-10ms.
Reading through that blog post gives me goosebumps, I'm not really persuaded it is correct. There's a place which can read already released data (the blog post even points it out) and suggests to hook into SIGSEGV. The membarier syscall or equivalent doesn't have to be present on all OSes, or be even possible to implement ‒ maybe it could use fully dynamic tics and not tick in case there are more cores than threads.
It postpones destruction of the data into some kind of limbo list that is cleaned up at arbitrary future time by a background thread. Arc-swap guarantees destruction as soon as everyone stops using it, so the difference there is somewhat between GC vs. Arc. I'm not sure if the approach from the blog post could be adapted to this approach.

As I said, these are more of feelings/impressions. If someone would be willing to experiment with it and compare, it would be great, but I'm not sure I'll find the time soon. I don't think this will fit well into this library, but it does seem interesting.

Answer 4 · 2020-07-22T12:00:30.000Z

I'm reading through the article now and that seems a bit more promissing (in particular, using that kind of atomic copy may bring some improvements). And I'm starting to think that users could want to pick different trade-offs (some might be OK with delayed destruction if that brings them better read performance). So I will think if it would be possible to abstract the locking strategy into a trait and be able to add some more to pick from in the future (either directly in the crate or by a third-party crate).

Answer 5 · 2021-01-03T17:17:23.000Z

I've mined some ideas from there and used them.

They propose wait-free/wait-free, arc-swap is currently wait-free/lock-free (readers/writers) and I suspect I'd have to make the readers slower to accomplish that. But it is an improvement already :-).

If someone wants to try another implementation that is truly based on the article, not just inspired by some ideas from there, it should be possible to be done as part of arc-swap as an experimental strategy, but I'll close this for now.