Do the loads in catchup need to be sequentially consistent?
ericm1024 opened this issue · 2 comments
static inline void __lfring_catchup(struct lfring * ring,
lfatomic_t tail, lfatomic_t head)
{
struct __lfring * q = (struct __lfring *) ring;
while (!atomic_compare_exchange_weak_explicit(&q->tail, &tail, head,
memory_order_acq_rel, memory_order_acquire)) {
head = atomic_load(&q->head);
tail = atomic_load(&q->tail);
if (__lfring_cmp(tail, >=, head))
break;
}
}
I noticed that the loads of head and tail don't specify a memory order, which means they default to memory_order_seq_cst according to cppreference.
I read the paper but I haven't fully absorbed the algorithms yet— @rusnikola do you think these loads need to be sequentially consistent for correctness, or can they be reduced to acquire loads?
On x86 this doesn't matter, but for weaker architectures a sequentially consistent load is more expensive than an acquire load (see this godbolt link for an example for ARM).
Btw, Thanks for open-sourcing this code.
Hi Eric (@ericm1024),
Thanks for your interest in the code. There is no particular reason why it uses the default memory_order_seq_cst; feel free to change it to a weaker memory order (but we need to double-check if the algorithm is still correct). At some point, I would want to check if memory barriers can be further weakened in the code.
Ruslan
@ericm1024 : I checked in a change which fixes that (in addition to other changes). But I think barriers can still be optimized further. Thanks for letting me know!