[SR-15166] Crash in _dispatch_wait_for_enqueuer on Android armeabi-v7a
triplef opened this issue · 5 comments
| Previous ID | SR-15166 |
| Radar | None |
| Original Reporter | @triplef |
| Type | Bug |
| Status | Resolved |
| Resolution | Done |
Attachment: Download
Environment
-
Android 9 and 10
-
ABI: armeabi-v7a/NEON
-
Various devices, mostly using the Cortex-A53 CPU
Additional Detail from JIRA
| Votes | 0 |
| Component/s | libdispatch |
| Labels | Bug |
| Assignee | None |
| Priority | Medium |
md5: 4bd80c1ec0b98960f2ded3ecd9341aa4
Issue Description:
We’re seeing the following crash in libdispatch _dispatch_wait_for_enqueuer() on Android armeabi-v7a due to __builtin_arm_wfe() causing a SIGILL:
Exception Type: Unknown (SIGILL)
Application Specific Information:
IllegalInstruction
Thread 0 Crashed:
0 libdispatch.so +0x0027374 _dispatch_wait_for_enqueuer (yield.c:47)
1 libdispatch.so +0x001e7d2 [inlined] _dispatch_main_queue_drain (queue.c:6797)
2 libdispatch.so +0x001e7d2 _dispatch_main_queue_callback_4CF (queue.c:6960)
... (application-specific runloop)
Following is the disassembled library around where it crashes at 0x0027374. It looks like the compiler unrolled the loop, and only the second WFE command seems to crash.
_dispatch_wait_for_enqueuer:
0002735c ldrex r1, [r0] ; DATA XREF=dword_129c4
00027360 cbz r1, loc_2736a
loc_27362:
00027362 mov r0, r1 ; CODE XREF=_dispatch_wait_for_enqueuer+22,
00027364 clrex
00027368 bx lr
; endp
loc_2736a:
0002736a wfe ; CODE XREF=_dispatch_wait_for_enqueuer+4
0002736c ldrexhs r1, [r0] ; DATA XREF=dword_129bc
00027370 cmp r1, #​0x0
00027372 bne loc_27362
00027374 wfe <<<<<< !!!!!!!!! CRASH !!!!!!!!!
00027376 ldrexhs r1, [r0]
0002737a cmp r1, #​0x0
0002737c bne loc_27362
0002737e wfe
00027380 ldrexhs r1, [r0]
00027384 cmp r1, #​0x0
00027386 bne loc_27362
00027388 wfe
0002738a ldrexhs r1, [r0]
0002738e cmp r1, #​0x0
00027390 bne loc_27362
00027392 wfe
00027394 ldrexhs r1, [r0]
00027398 cmp r1, #​0x0
0002739a bne loc_27362
0002739c wfe
0002739e ldrexhs r1, [r0]
000273a2 cmp r1, #​0x0
000273a4 bne loc_27362
000273a6 wfe
000273a8 ldrexhs r1, [r0]
000273ac cmp r1, #​0x0
000273ae bne loc_27362
000273b0 wfe
000273b2 ldrexhs r1, [r0]
000273b6 cmp r1, #​0x0
000273b8 bne loc_27362
000273ba wfe
000273bc ldrexhs r1, [r0]
000273c0 cmp r1, #​0x0
000273c2 bne loc_27362
cc @compnerd
@buttaface I saw that you’ve been doing some work with libdispatch on Android – any thoughts on this? This is our most frequent crash on Android, but we’re unsure what to do here.
I've only been adding build tweaks to keep it running so I'm not familiar with how libdispatch works internally, nor have I ever heard of this wfe instruction. A search turned up this github issue where rocksdb switched from wfe to yield, which performed much better. You could try the same by replacing __builtin_arm_wfe with dispatch_hardware_pause(), as can be seen for other arches later in that function, or simply comment that builtin out, and rebuild libdispatch to see if that helps.