mikeash/MAZeroingWeakRef

Crash on iPod 1st gen with iOS 3.0

gradha opened this issue · 8 comments

Hello.

I've been using MAZeroingWeakRef for some time without problems on devices with 4.x and above. Now I added MAZeroingWeakRef to an older project which requires all iOS compatibility support and using a weak references crashes. The crash doesn't happen immediately when creating the weak references, only when reading it back calling its target method. I've tried going back to commit 3b120d8 and see if that helped but it didn't (and don't know how far should I go without avoiding any other problems related to stuff that has been fixed).

If I modify the property getter to always return nil, the program works (with reduced functionality because parent is nil) and still crashes when deallocating the weak reference object.

To the best of my copying/pasting abilities, here's one of the limbo call stacks when it crashes:

Thread 1
 0 <????>
Thread 2
 0 mach_msg_trapA
    0x31d47144  <+0000>  mov    r12, sp
    0x31d47148  <+0004>  push   {r4, r5, r6, r8}
    0x31d4714c  <+0008>  ldm    r12, {r4, r5, r6}
    0x31d47150  <+0012>  mvn    r12, #30    ; 0x1e
    0x31d47154  <+0016>  svc    0x00000080
  * 0x31d47158  <+0020>  pop    {r4, r5, r6, r8}
    0x31d4715c  <+0024>  bx lr
 5 _pthread_body
    0x31d7058c  <+0000>  push   {r4, r7, lr}
    0x31d70590  <+0004>  add    r7, sp, #4  ; 0x4
    0x31d70594  <+0008>  mov    r4, r0
    0x31d70598  <+0012>  bl 0x31d705b8 <_pthread_set_self>
    0x31d7059c  <+0016>  ldr    r0, [r4, #60]
    0x31d705a0  <+0020>  mov    lr, pc
    0x31d705a4  <+0024>  ldr    pc, [r4, #56]
  * 0x31d705a8  <+0028>  mov    r1, r0
    0x31d705ac  <+0032>  mov    r0, r4
    0x31d705b0  <+0036>  pop    {r4, r7, lr}
    0x31d705b4  <+0040>  b  0x31d706fc <_pthread_exit>
Thread 3
 0 mach_msg_trapA
    0x31d47144  <+0000>  mov    r12, sp
    0x31d47148  <+0004>  push   {r4, r5, r6, r8}
    0x31d4714c  <+0008>  ldm    r12, {r4, r5, r6}
    0x31d47150  <+0012>  mvn    r12, #30    ; 0x1e
    0x31d47154  <+0016>  svc    0x00000080
  * 0x31d47158  <+0020>  pop    {r4, r5, r6, r8}
    0x31d4715c  <+0024>  bx lr
 7 _pthread_body
    0x31d7058c  <+0000>  push   {r4, r7, lr}
    0x31d70590  <+0004>  add    r7, sp, #4  ; 0x4
    0x31d70594  <+0008>  mov    r4, r0
    0x31d70598  <+0012>  bl 0x31d705b8 <_pthread_set_self>
    0x31d7059c  <+0016>  ldr    r0, [r4, #60]
    0x31d705a0  <+0020>  mov    lr, pc
    0x31d705a4  <+0024>  ldr    pc, [r4, #56]
  * 0x31d705a8  <+0028>  mov    r1, r0
    0x31d705ac  <+0032>  mov    r0, r4
    0x31d705b0  <+0036>  pop    {r4, r7, lr}
    0x31d705b4  <+0040>  b  0x31d706fc <_pthread_exit>
Thread 4
 0 select$DARWIN_EXTSN
    0x31d6b0cc  <+0000>  mov    r12, sp
    0x31d6b0d0  <+0004>  push   {r4, r5}
    0x31d6b0d4  <+0008>  ldr    r4, [r12]
    0x31d6b0d8  <+0012>  mov    r12, #93    ; 0x5d
    0x31d6b0dc  <+0016>  svc    0x00000080
  * 0x31d6b0e0  <+0020>  pop    {r4, r5}
    0x31d6b0e4  <+0024>  bcc    0x31d6b0fc <select$DARWIN_EXTSN+48>
    0x31d6b0e8  <+0028>  ldr    r12, [pc, #4]   ; 0x31d6b0f4 <select$DARWIN_EXTSN+40>
    0x31d6b0ec  <+0032>  ldr    r12, [pc, r12]
    0x31d6b0f0  <+0036>  b  0x31d6b0f8 <select$DARWIN_EXTSN+44>
    0x31d6b0f4  <+0040>  undefined
    0x31d6b0f8  <+0044>  bx r12
    0x31d6b0fc  <+0048>  bx lr
 3 <????>

Not sure I understand your call stack. Are you able to reproduce the problem? If so, a 't a a bt' (stands for thread apply all backtrace) at the (gdb) prompt would be helpful.

Since this was running in an app, I tried to simplify the test case forcing the weak reference creation to the main entry point of the app, reducing the number of running threads. So with the following code:

#import "MAZeroingWeakRef.h"

- (void)applicationDidFinishLaunching:(UIApplication *)application
{
    window_ = [[UIWindow alloc] initWithFrame:[[UIScreen mainScreen] bounds]];
    window_.backgroundColor = [UIColor whiteColor];

    DLOG(@"1st post");
    MAZeroingWeakRef *ref = [[MAZeroingWeakRef alloc] initWithTarget:window_];
    DLOG(@"Reference created, trying to read...");
    id result = [ref target];
    DLOG(@"Reference read %@", result);

    sleep(10);
    exit(0);

The app crashes right when reading the reference with the following gdb output:

[Switching to process 10755 thread 0x2a03]
[Switching to process 10755 thread 0x2a03]
Re-enabling shared library breakpoint 1
Catchpoint 2 (throw)2011-12-23 21:30:40.674 Irekia[2868:207] 1st post
2011-12-23 21:30:40.699 Irekia[2868:207] Reference created, trying to read...
warning: Cancelling call - objc code on the current thread's stack makes this unsafe.
(gdb) t a a bt

Thread 2 (thread 11267):
#0  0x31d47158 in mach_msg_trap ()
#1  0x31d49ee0 in mach_msg ()
#2  0x30254554 in CFRunLoopRunSpecific ()
#3  0x3025416a in CFRunLoopRunInMode ()
#4  0x3588dbd0 in RunWebThread ()
#5  0x31d705a8 in _pthread_body ()
#6  0x00000000 in ?? ()

Thread 1 (thread 10755):
#0  0x00000000 in ?? ()
(gdb) 

For completeness, here's the previous crash (also reproducible) with proper gdb output (it certainly looks much more readable than what I was getting through xcode). It may not be so good to read since there are other things going on with the app:

(gdb) t a a bt

Thread 5 (thread 12035):
#0  0x31d6b0e0 in select$DARWIN_EXTSN ()
#1  0x3021dd24 in __CFSocketManager ()
#2  0x31d705a8 in _pthread_body ()
#3  0x00000000 in ?? ()

Thread 4 (thread 11779):
#0  0x31d47158 in mach_msg_trap ()
#1  0x31d49ee0 in mach_msg ()
#2  0x30254554 in CFRunLoopRunSpecific ()
#3  0x3025416a in CFRunLoopRunInMode ()
#4  0x3055af3a in +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] ()
#5  0x30554068 in -[NSThread main] ()
#6  0x305023f8 in __NSThread__main__ ()
#7  0x31d705a8 in _pthread_body ()
#8  0x00000000 in ?? ()

Thread 3 (thread 11523):
#0  0x31d47158 in mach_msg_trap ()
#1  0x31d49ee0 in mach_msg ()
#2  0x30254554 in CFRunLoopRunSpecific ()
#3  0x3025416a in CFRunLoopRunInMode ()
#4  0x3055ef02 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] ()
#5  0x0003a736 in -[FlokiAppDelegate start_background_runloop:] (self=0x117110, _cmd=0x4e330, dummy=0x0) at /Users/gradha/project/efaber/floki/src/global/FlokiAppDelegate.m:610
#6  0x30554068 in -[NSThread main] ()
#7  0x305023f8 in __NSThread__main__ ()
#8  0x31d705a8 in _pthread_body ()
#9  0x00000000 in ?? ()

Thread 2 (thread 11267):
#0  0x31d471b4 in semaphore_wait_signal_trap ()
#1  0x31d786fc in semaphore_wait_signal ()
#2  0x31d49af4 in pthread_mutex_lock ()
#3  0x3588dd10 in _WebTryThreadLock ()
#4  0x3588dc30 in WebRunLoopLock ()
#5  0x3020cd90 in __CFRunLoopDoObservers ()
#6  0x3025486e in CFRunLoopRunSpecific ()
#7  0x3025416a in CFRunLoopRunInMode ()
#8  0x3588dbd0 in RunWebThread ()
#9  0x31d705a8 in _pthread_body ()
#10 0x00000000 in ?? ()

Thread 1 (thread 10755):
#0  0x00000000 in ?? ()

Nasty! Looks like the stack is getting corrupted so there's no stack trace to prevent. Can you step through the MAZWR internals invoked by the -target call to see exactly which line crashes? Unfortunately I don't have any 3.0 test device here to try with....

Does it make sense that it crashes when something is deallocated? I put a breakpoint inside -target and reached properly the line return [ret autorelease];. At that point I did try:

(gdb) po ret
<UIWindow_MAZeroingWeakRefSubclass: 0x1188e0; baseClass = UIWindow; frame = (0 0; 320 480); hidden = YES; layer = <CALayer: 0x1042e0>>
(gdb) t a a bt

Thread 2 (thread 11267):
#0  0x31d47158 in mach_msg_trap ()
#1  0x31d49ee0 in mach_msg ()
#2  0x30254554 in CFRunLoopRunSpecific ()
#3  0x3025416a in CFRunLoopRunInMode ()
#4  0x3588dbd0 in RunWebThread ()
#5  0x31d705a8 in _pthread_body ()
#6  0x00000000 in ?? ()

Thread 1 (thread 10755):
#0  -[MAZeroingWeakRef target] (self=0x11b520, _cmd=0x30112d6c) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/MAZeroingWeakRef.m:620
#1  0x00039424 in -[FlokiAppDelegate applicationDidFinishLaunching:] (self=0x117130, _cmd=0x30145798, application=0x115f70) at /Users/gradha/project/efaber/floki/src/global/FlokiAppDelegate.m:209
#2  0x308f15a4 in -[UIApplication _performInitializationWithURL:sourceBundleID:] ()
#3  0x308f117c in -[UIApplication _runWithURL:sourceBundleID:] ()
#4  0x309374b0 in -[UIApplication handleEvent:withNewEvent:] ()
#5  0x30936cf0 in -[UIApplication sendEvent:] ()
#6  0x3093687c in _UIApplicationHandleEvent ()
#7  0x3204696c in PurpleEventCallback ()
#8  0x30254a76 in CFRunLoopRunSpecific ()
#9  0x3025416a in CFRunLoopRunInMode ()
#10 0x308f0354 in -[UIApplication _run] ()
#11 0x308eea94 in UIApplicationMain ()
#12 0x0003bfdc in main (argc=1, argv=0x2ffff0d8) at /Users/gradha/project/efaber/floki/src/main.m:12
(gdb) 

But at the time the method attempts to return (and I guess some dealloc is called) the crash happens. Trying to step into pressing shift or control didn't seem to show me anything new other than the gdb message Not safe to look up objc runtime data..

Certainly could if it was an over-release problem or something of the sort. Is it the -autorelease call that crashes, then?

The autorelease call doesn't crash, its when the method goes out of scope that something crashes. I retried linking the main.c implementation you provide with test cases, rename the main() to mazmain() and call it inside my app. As expected, TestBasic crashes when trying to read the ref variable. I put a breakpoint inside MAZeroingWeakRef::initWithTarget: and it already crashes inside there if I try to gdb "po self", since -description tries to call -target which is crashing. The crash happens before target is registered or assigned, pointig to the MAZeroingWeakRef class creation code as the culprit of the problems. This was tested on a 3.0 ipod and 3.1.3 iphone:

(gdb) t a a bt

Thread 2 (thread 12291):
#0  0x33aae488 in mach_msg_trap ()
#1  0x33ab106c in mach_msg ()
#2  0x323f5008 in CFRunLoopRunSpecific ()
#3  0x323f4c1e in CFRunLoopRunInMode ()
#4  0x3018c1dc in RunWebThread ()
#5  0x33ad8788 in _pthread_body ()
#6  0x00000000 in ?? ()

Thread 1 (thread 11779):
#0  -[MAZeroingWeakRef initWithTarget:] (self=0x2235e0, _cmd=0x3322f5ac, target=0x2233c0) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/MAZeroingWeakRef.m:595
#1  0x0008d5e0 in TestBasic () at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:140
#2  0x0008d104 in __Test_block_invoke_0 (.block_descriptor=0x2fffe090) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:80
#3  0x0008d0b2 in WithPool (block=0x2fffe090) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:69
#4  0x0008d062 in Test (func=0x8d595 <TestBasic+1>, name=0xaeb25 "TestBasic") at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:77
#5  0x0008d370 in __mazmain_block_invoke_0 (.block_descriptor=0xcad90) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:498
#6  0x0008d0b2 in WithPool (block=0xcad90) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:69
#7  0x0008d34c in mazmain () at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:497
#8  0x0003c760 in -[FlokiAppDelegate applicationDidFinishLaunching:] (self=0x21fed0, _cmd=0x3322605c, application=0x21eff0) at /Users/gradha/project/efaber/floki/src/global/FlokiAppDelegate.m:216
#9  0x324a7e90 in -[UIApplication _performInitializationWithURL:sourceBundleID:] ()
#10 0x324a7a68 in -[UIApplication _runWithURL:sourceBundleID:] ()
#11 0x324f8e00 in -[UIApplication handleEvent:withNewEvent:] ()
#12 0x324f863c in -[UIApplication sendEvent:] ()
#13 0x324f8094 in _UIApplicationHandleEvent ()
#14 0x335067e4 in PurpleEventCallback ()
#15 0x323f552a in CFRunLoopRunSpecific ()
#16 0x323f4c1e in CFRunLoopRunInMode ()
#17 0x324a6c08 in -[UIApplication _run] ()
#18 0x324a5230 in UIApplicationMain ()
#19 0x00040094 in main (argc=1, argv=0x2ffff044) at /Users/gradha/project/efaber/floki/src/main.m:12
(gdb) print self
$1 = (MAZeroingWeakRef *) 0x2235e0
(gdb) print [self class]
Unable to call function "objc_msgSend" at 0x3138cea8: no return type information available.
To call this function anyway, you can cast the return type explicitly (e.g. 'print (float) fabs (3.0)')
(gdb) print self
$2 = (MAZeroingWeakRef *) 0x2235e0
(gdb) po [self class]
MAZeroingWeakRef
(gdb) print [self target]

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00000000 in ?? ()
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (objc_msgSend) will be abandoned.
(gdb) 

When I repeat this on a 4.3.5 ipad the gdb session is quite different:

(gdb) t a a bt

Thread 5 (thread 13059):
#0  0x32924c00 in mach_msg_trap ()
#1  0x3292475e in mach_msg ()
#2  0x355952be in __CFRunLoopServiceMachPort ()
#3  0x35597568 in __CFRunLoopRun ()
#4  0x35527ec2 in CFRunLoopRunSpecific ()
#5  0x35527dca in CFRunLoopRunInMode ()
#6  0x3374a284 in RunWebThread ()
#7  0x336ed310 in _pthread_start ()
#8  0x336eebbc in thread_start ()

Thread 4 (thread 12803):
#0  0x00000000 in ?? ()

Thread 3 (thread 12547):
#0  0x32927fbc in kevent ()
#1  0x35acd038 in _dispatch_mgr_invoke ()
#2  0x35ace040 in _dispatch_queue_invoke ()
#3  0x35acd5f0 in _dispatch_worker_thread2 ()
#4  0x336ee590 in _pthread_wqthread ()
#5  0x336eebc4 in start_wqthread ()

Thread 2 (thread 12291):
#0  0x329273ec in __workq_kernreturn ()
#1  0x336ee6de in _pthread_wqthread ()
#2  0x336eebc4 in start_wqthread ()

Thread 1 (thread 11779):
#0  -[MAZeroingWeakRef initWithTarget:] (self=0x241740, _cmd=0x3445be70, target=0x241d00) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/MAZeroingWeakRef.m:605
#1  0x0009234a in TestBasic () at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:140
#2  0x00091d3a in __Test_block_invoke_0 (.block_descriptor=0x2fdfd76c) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:80
#3  0x00091cd6 in WithPool (block=0x2fdfd76c) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:69
#4  0x00091c92 in Test (func=0x922cd <TestBasic+1>, name=0xb3b25 "TestBasic") at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:77
#5  0x000920f0 in __mazmain_block_invoke_0 (.block_descriptor=0xcfd20) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:498
#6  0x00091cd6 in WithPool (block=0xcfd20) at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:69
#7  0x00091fc0 in mazmain () at /Users/gradha/project/efaber/floki/external/MAZeroingWeakRef/Source/main.m:497
#8  0x0003ed32 in -[FlokiAppDelegate applicationDidFinishLaunching:] (self=0x21a280, _cmd=0x34448bdf, application=0x217360) at /Users/gradha/project/efaber/floki/src/global/FlokiAppDelegate.m:216
#9  0x3418c85c in -[UIApplication _callInitializationDelegatesForURL:payload:suspended:] ()
#10 0x34186b64 in -[UIApplication _runWithURL:payload:launchOrientation:statusBarStyle:statusBarHidden:] ()
#11 0x3415b7d6 in -[UIApplication handleEvent:withNewEvent:] ()
#12 0x3415b214 in -[UIApplication sendEvent:] ()
#13 0x3415ac52 in _UIApplicationHandleEvent ()
#14 0x3414ee76 in PurpleEventCallback ()
#15 0x35594a96 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ ()
#16 0x3559683e in __CFRunLoopDoSource1 ()
#17 0x3559760c in __CFRunLoopRun ()
#18 0x35527ec2 in CFRunLoopRunSpecific ()
#19 0x35527dca in CFRunLoopRunInMode ()
#20 0x34185d48 in -[UIApplication _run] ()
#21 0x34183806 in UIApplicationMain ()
#22 0x00042886 in main (argc=1, argv=0x2fdff05c) at /Users/gradha/project/efaber/floki/src/main.m:12
(gdb) po self
<MAZeroingWeakRef: 0x241740 -> <NSObject_MAZeroingWeakRefSubclass: 0x241d00>>
(gdb) 

The thing that is strange is... what are those __Test_block_invoke_0 doing? I thought iOS 3.x didn't have support for blocks, which was only available since iOS 4.x? Maybe everything is fine but some blocks code is run when it should not?

The test code isn't built to work without blocks, although the regular code is. But you're right, it's surprising that it works at all. I'm guessing that basic stack blocks work, since that's all on the compiler, but anything that uses the blocks runtime probably crashes.

Looking at the source, I noticed that some blocks code is controlled with USE_BLOCKS_BASED_LOCKING rather than checking for compiler support. Not sure why. Maybe try turning that off and see if it helps?

Well that was easy! Setting USE_BLOCKS_BASED_LOCKING to zero makes the code run again on the ipod. No idea on the compiler support, since I'm using iOS 5.0 with deployment for 3.0, so it seems you would need a runtime based check rather than compile time for the code to use the blocks based implementation on 4.0 and upwards.

Thanks for the help in finding the problem.