ivmai/libatomic_ops

test_malloc abort on sparc

ivmai opened this issue · 3 comments

ivmai commented

Source: master (69e1880)
Host: Linux 5.18.0-3-sparc64-smp (Debian)
Compiler: clang 13.0.1
How to build: CC=clang ./configure --enable-assertions --disable-atomic-intrinsics && make -j check
Fail rate: ~3/4
Note: cannot reproduce with gcc-12

Output 1 (test_malloc):
Performing 1000 reversals of 1000 element lists in 16 threads
Testing AO_malloc/AO_free
Aborted

Output 2 (test_malloc):
Performing 1000 reversals of 1000 element lists in 16 threads
Testing AO_malloc/AO_free
Segmentation fault

Some related issue: #44 (a SIGSEGV in test_malloc on some other arch)

ivmai commented

As of source: master (620ae9d)
How to build: CC=clang ./configure --enable-assertions && make -j check CFLAGS_EXTRA="-D AO_DISABLE_GCC_ATOMICS"
Observed at least on gcc102 machine (gcc farm).

Not reproduced: if CFLAGS_EXTRA="-D AO_USE_ALMOST_LOCK_FREE"
or CFLAGS_EXTRA="-D AO_DISABLE_GCC_ATOMICS -D AO_NO_SPARC_V9"
or CFLAGS_EXTRA="-D AO_DISABLE_GCC_ATOMICS -D AO_GENERALIZE_ASM_BOOL_CAS"

ivmai commented

Changing code in AO_stack_pop_explicit_aux_acquire works around the issue:
if (AO_EXPECT_FALSE(!AO_compare_and_swap_release(list, first, next)))
->
if (AO_EXPECT_FALSE(first!=AO_fetch_compare_and_swap_release(list, first, next)))

Asm code (original):

.LBB5_14:
	and %i4, -8, %g3
	ldx [%g3], %g4
	mov	%i4, %g5
	!APP
	membar #StoreLoad | #LoadLoad
	casx [%i0],%g5,%g4
	membar #StoreLoad | #StoreStore
	cmp %g5,%g4
	be,a 0f
	mov 1,%g5
	clr %g5
	0:
	
	!NO_APP
	cmp %g5, 0
	bne	.LBB5_16
	nop
	!APP

Asm code after the test W/A:

.LBB5_14:
	and %i4, -8, %g3
	ldx [%g3], %g4
	!APP
	membar #StoreLoad | #LoadLoad
	casx [%i0],%i4,%g4
	membar #StoreLoad | #StoreStore
	
	!NO_APP
	cmp %i4, %g4
	be %xcc, .LBB5_16
	nop
	ba .LBB5_15
	nop
.LBB5_15:

(The difference is that the 2nd variant does not use %g5.)

ivmai commented

Hello @kernigh and @hboehm,
If you have any insight about the root cause of this failure, please let me know.
For now I'm going to apply a workaround by simplifying asm code in AO_compare_and_swap_full (move comparison of old val and CAS result to C level).