rusnikola/lfqueue

Do the loads in catchup need to be sequentially consistent?

ericm1024 opened this issue · 2 comments

static inline void __lfring_catchup(struct lfring * ring,
	lfatomic_t tail, lfatomic_t head)
{
	struct __lfring * q = (struct __lfring *) ring;

	while (!atomic_compare_exchange_weak_explicit(&q->tail, &tail, head,
			memory_order_acq_rel, memory_order_acquire)) {
		head = atomic_load(&q->head);
		tail = atomic_load(&q->tail);
		if (__lfring_cmp(tail, >=, head))
			break;
	}
}

I noticed that the loads of head and tail don't specify a memory order, which means they default to memory_order_seq_cst according to cppreference.

I read the paper but I haven't fully absorbed the algorithms yet— @rusnikola do you think these loads need to be sequentially consistent for correctness, or can they be reduced to acquire loads?

On x86 this doesn't matter, but for weaker architectures a sequentially consistent load is more expensive than an acquire load (see this godbolt link for an example for ARM).

Btw, Thanks for open-sourcing this code.

Hi Eric (@ericm1024),

Thanks for your interest in the code. There is no particular reason why it uses the default memory_order_seq_cst; feel free to change it to a weaker memory order (but we need to double-check if the algorithm is still correct). At some point, I would want to check if memory barriers can be further weakened in the code.

Ruslan

@ericm1024 : I checked in a change which fixes that (in addition to other changes). But I think barriers can still be optimized further. Thanks for letting me know!