Manishearth/triomphe

Performance of `triomphe::Arc` is sometimes worse than `std::sync::Arc`

orium opened this issue · 8 comments

Hi,

@michaelsproul and I ran some benchmarks in rpds . Most results improved when switching from std::sync::Arc to triomphe::Arc, but I was surprised that some benchmarks regressed significantly (> 30%). I'm wondering if there are optimizations on std::sync::Arc that were never ported to triomphe and that could improve triomphe and all projects that use it.

This is the PR with the benchmark results: orium/rpds#88

No idea; this crate was written quite a while ago.

If I take a look at the std::sync::Arc and port any potential optimizations to triomphe will you be available to review and merge them? (Not promising anything as I am busier that usual lately.)

I was browsing std::sync::Arc's recent changes and this one stood out as something we could maybe backport: rust-lang/rust#115546. Although I'm not familiar enough with the memory model to know if this would make a big difference.

This comment suggests is_unique relies on the Acquire semantics:

triomphe/src/arc.rs

Lines 546 to 549 in f78899c

// See the extensive discussion in [1] for why this needs to be Acquire.
//
// [1] https://github.com/servo/servo/issues/21186
Self::count(self) == 1

I would be happy to get patches, I can't guarantee I'll be able to review and merge them in a timely manner but if they map closely to upstream code that would make it easier. Would also love help reviewing them

I concur, I am working on Typst (as a contributor not owner/creator) and specifically on performance, we don't use weak refs to I figured swapping out std arcs for triomphe would help but performance is around 30% worse using triomphe so there is something here, perhaps an optimization that exists in the std lib and not here? As someone said, they have (apparently) relaxed the ordering in the std lib, perhaps we could do the same in Triomphe?
Thanks, Dherse

The two differences I can see are:

  1. In make_mut std uses Acquire only when the ref count is 1, triomphe always acquires. This shouldn't matter on x86.
  2. Triomphe does an acquire load in drop, std an acquire fence. This is very minor.

Are you benchmarking heavily contended accesses or mostly single threaded?

@jmspiewak The rpds benchmarks are mostly single-threaded