Performance of `triomphe::Arc` is sometimes worse than `std::sync::Arc`
orium opened this issue · 8 comments
Hi,
@michaelsproul and I ran some benchmarks in rpds . Most results improved when switching from std::sync::Arc
to triomphe::Arc
, but I was surprised that some benchmarks regressed significantly (> 30%). I'm wondering if there are optimizations on std::sync::Arc
that were never ported to triomphe
and that could improve triomphe
and all projects that use it.
This is the PR with the benchmark results: orium/rpds#88
No idea; this crate was written quite a while ago.
If I take a look at the std::sync::Arc
and port any potential optimizations to triomphe will you be available to review and merge them? (Not promising anything as I am busier that usual lately.)
I was browsing std::sync::Arc
's recent changes and this one stood out as something we could maybe backport: rust-lang/rust#115546. Although I'm not familiar enough with the memory model to know if this would make a big difference.
This comment suggests is_unique
relies on the Acquire
semantics:
Lines 546 to 549 in f78899c
I would be happy to get patches, I can't guarantee I'll be able to review and merge them in a timely manner but if they map closely to upstream code that would make it easier. Would also love help reviewing them
I concur, I am working on Typst (as a contributor not owner/creator) and specifically on performance, we don't use weak refs to I figured swapping out std arcs for triomphe would help but performance is around 30% worse using triomphe so there is something here, perhaps an optimization that exists in the std lib and not here? As someone said, they have (apparently) relaxed the ordering in the std lib, perhaps we could do the same in Triomphe?
Thanks, Dherse
The two differences I can see are:
- In
make_mut
std usesAcquire
only when the ref count is 1, triomphe always acquires. This shouldn't matter on x86. - Triomphe does an acquire load in
drop
, std an acquire fence. This is very minor.
Are you benchmarking heavily contended accesses or mostly single threaded?
@jmspiewak The rpds
benchmarks are mostly single-threaded