artsy/eigen

App launch time increased

alloy opened this issue · 142 comments

alloy commented

The LaunchImage is shown way longer than before. I wonder if this could be DYLD loading all the frameworks.

alloy commented

Ouch, yeah, 4.3 sec to load all images 😭

total time: 4.6 seconds (100.0%)
total images loaded:  290 (235 from dyld shared cache)
total segments mapped: 162, into 2414 pages with 312 pages pre-fetched
total images loading time: 4.3 seconds (93.8%)
total dtrace DOF registration time: 0.14 milliseconds (0.0%)
total rebase fixups:  91,464
total rebase fixups time: 55.97 milliseconds (1.2%)
total binding fixups: 261,574
total binding fixups time: 111.74 milliseconds (2.3%)
total weak binding fixups time: 0.33 milliseconds (0.0%)
total bindings lazily fixed up: 0 of 0
total initializer time: 118.93 milliseconds (2.5%)
    libSystem.B.dylib : 76.47 milliseconds (1.6%)
libBacktraceRecording.dylib : 0.57 milliseconds (0.0%)
       libc++.1.dylib : 0.05 milliseconds (0.0%)
      libobjc.A.dylib : 0.03 milliseconds (0.0%)
       CoreFoundation : 0.87 milliseconds (0.0%)
               vImage : 0.01 milliseconds (0.0%)
     libGLImage.dylib : 0.14 milliseconds (0.0%)
libFosl_dynamic.dylib : 0.02 milliseconds (0.0%)
            CoreImage : 0.01 milliseconds (0.0%)
        CoreTelephony : 0.01 milliseconds (0.0%)
     HockeySDK_Source : 0.56 milliseconds (0.0%)
                Artsy : 30.45 milliseconds (0.6%)
total symbol trie searches:    87386
total symbol table binary searches:    0
total images defining weak symbols:  15
total images using weak symbols:  46
alloy commented

Looks like we’ll have to enable prebinding 😭 The images should have a vmaddr other than 0 and a unique one for each.

For more info see https://developer.apple.com/library/ios/documentation/MacOSX/Conceptual/BPFrameworks/Concepts/FrameworkBinding.html#//apple_ref/doc/uid/20002256-106894

otool -l /Users/eloy/Library/Developer/Xcode/DerivedData/Artsy-dvyztngmkwpjoxercvsdsplrzfzz/Build/Products/Debug-iphoneos/AFNetworking.framework/AFNetworking
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 872
  segname __TEXT
   vmaddr 0x0000000000000000
   vmsize 0x0000000000030000
  fileoff 0
 filesize 196608
  maxprot 0x00000005
 initprot 0x00000005
   nsects 10
    flags 0x0
otool -l /Users/eloy/Library/Developer/Xcode/DerivedData/Artsy-dvyztngmkwpjoxercvsdsplrzfzz/Build/Products/Debug-iphoneos/AFHTTPRequestOperationLogger.framework/AFHTTPRequestOperationLogger 
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 792
  segname __TEXT
   vmaddr 0x0000000000000000
   vmsize 0x0000000000008000
  fileoff 0
 filesize 32768
  maxprot 0x00000005
 initprot 0x00000005
   nsects 9
    flags 0x0
[SNIP]
alloy commented

There’s also the option of lazy loading the frameworks, i.e. on demand, but that’s not automatable.

alloy commented

This prints out the list of vmaddrs of the frameworks in the app after letting DYLD figure it out without prebinding, which could be used to assign the -seg1addr flags to the corresponding framework’s OTHER_LDFLAGS:

#import <mach-o/dyld.h>
#import <inttypes.h>
static void
PrintImageLoadAddresses(void)
{
    const char *frameworks_path = [[[[NSBundle mainBundle] bundlePath] stringByAppendingPathComponent:@"Frameworks"] UTF8String];
    unsigned long frameworks_path_length = strlen(frameworks_path);
    for (uint32_t i = 0; i < _dyld_image_count(); i++) {
        const char *image_path = _dyld_get_image_name(i);
        if (strncmp(image_path, frameworks_path, frameworks_path_length) == 0) {
            NSString *frameworkName = [[NSString stringWithUTF8String:image_path] lastPathComponent];
            intptr_t slide = _dyld_get_image_vmaddr_slide(i);
            printf("0x%" PRIxPTR ": %s\n", slide, [frameworkName UTF8String]);
        }
    }
}
    0x100474000: AFHTTPRequestOperationLogger
    0x100494000: AFNetworking
    0x1004f4000: AFOAuth1Client
    0x100524000: ALPValidator
    0x100560000: ARASCIISwizzle
    0x100588000: ARAnalytics
    0x1005c4000: ARCollectionViewMasonryLayout
    0x1005f0000: ARGenericTableViewController
    0x100610000: ARTiledImageView
    0x100630000: Analytics
    0x100690000: Artsy_UIButtons
    0x1006b0000: Artsy_UIColors
    0x1006d8000: Artsy_UIFonts
    0x1006f8000: Artsy_UILabels
    0x100718000: Bolts
    0x10076c000: CocoaLumberjack
    0x1007d0000: DHCShakeNotifier
    0x1007f8000: DRKonamiCode
    0x100828000: EDColor
    0x100858000: FBSDKCoreKit
    0x100944000: FBSDKLoginKit
    0x100998000: FLKAutoLayout
    0x1009b0000: FODFormKit
    0x100a10000: FXBlurView
    0x100a34000: HockeySDK_Source
    0x100b9c000: ISO8601DateFormatter
    0x100bcc000: InterAppCommunication
    0x100bf0000: JLRoutes
    0x100c10000: JSBadgeView
    0x100c34000: JSDecoupledAppDelegate
    0x100c5c000: KSDeferred
    0x100c74000: Keys
    0x100c88000: MMMarkdown
    0x100cbc000: Mantle
    0x100d08000: MultiDelegate
    0x100d20000: NAMapKit
    0x100d60000: ORKeyboardReactingApplication
    0x100d90000: ORStackView
    0x100db4000: ObjectiveSugar
    0x100de4000: RSSwizzle
    0x100e0c000: ReactiveCocoa
    0x100f10000: SDWebImage
    0x100f7c000: TPDWeakProxy
    0x100f8c000: TRVSDictionaryWithCaseInsensitivity
    0x100fb8000: TSMiniWebBrowser_dblock
    0x100fdc000: UIAlertView_Blocks
    0x101008000: UICKeyChainStore
    0x101028000: UIView_BooleanAnimations
    0x101048000: VCRURLConnection
    0x101084000: iRate
    0x1010c0000: libextobjc
alloy commented

I set the pre bindings with the following Podfile post install hook, but either I’m doing something wrong or prebinding makes no difference in load time :-/

post_install do |installer|
  vmaddrs = <<-EOS
    0x100474000: AFHTTPRequestOperationLogger
    0x100494000: AFNetworking
    0x1004f4000: AFOAuth1Client
    0x100524000: ALPValidator
    0x100560000: ARASCIISwizzle
    0x100588000: ARAnalytics
    0x1005c4000: ARCollectionViewMasonryLayout
    0x1005f0000: ARGenericTableViewController
    0x100610000: ARTiledImageView
    0x100630000: Analytics
    0x100690000: Artsy_UIButtons
    0x1006b0000: Artsy_UIColors
    0x1006d8000: Artsy_UIFonts
    0x1006f8000: Artsy_UILabels
    0x100718000: Bolts
    0x10076c000: CocoaLumberjack
    0x1007d0000: DHCShakeNotifier
    0x1007f8000: DRKonamiCode
    0x100828000: EDColor
    0x100858000: FBSDKCoreKit
    0x100944000: FBSDKLoginKit
    0x100998000: FLKAutoLayout
    0x1009b0000: FODFormKit
    0x100a10000: FXBlurView
    0x100a34000: HockeySDK_Source
    0x100b9c000: ISO8601DateFormatter
    0x100bcc000: InterAppCommunication
    0x100bf0000: JLRoutes
    0x100c10000: JSBadgeView
    0x100c34000: JSDecoupledAppDelegate
    0x100c5c000: KSDeferred
    0x100c74000: Keys
    0x100c88000: MMMarkdown
    0x100cbc000: Mantle
    0x100d08000: MultiDelegate
    0x100d20000: NAMapKit
    0x100d60000: ORKeyboardReactingApplication
    0x100d90000: ORStackView
    0x100db4000: ObjectiveSugar
    0x100de4000: RSSwizzle
    0x100e0c000: ReactiveCocoa
    0x100f10000: SDWebImage
    0x100f7c000: TPDWeakProxy
    0x100f8c000: TRVSDictionaryWithCaseInsensitivity
    0x100fb8000: TSMiniWebBrowser_dblock
    0x100fdc000: UIAlertView_Blocks
    0x101008000: UICKeyChainStore
    0x101028000: UIView_BooleanAnimations
    0x101048000: VCRURLConnection
    0x101084000: iRate
    0x1010c0000: libextobjc
  EOS

  vmaddrs = Hash[*vmaddrs.split("\n").flat_map { |line| line.match(/(0x\w+): (.+)/)[1..2].reverse }]

  installer.pods_project.targets.each do |target|
    path = target.product_reference.path
    if File.extname(path) == '.framework'
      name = File.basename(path, '.framework')
      if vmaddr = vmaddrs[name]
        target.build_configurations.each do |config|
          config.build_settings['OTHER_LDFLAGS'] = "$(inherited) -seg1addr #{vmaddr}"
        end
      else
        puts "[!] No vmaddr found for: #{name}"
      end
    end
  end
end
total time: 4.4 seconds (100.0%)
total images loaded:  290 (235 from dyld shared cache)
total segments mapped: 162, into 2414 pages with 312 pages pre-fetched
total images loading time: 4.2 seconds (94.8%)
total dtrace DOF registration time: 0.14 milliseconds (0.0%)
total rebase fixups:  91,470
total rebase fixups time: 29.95 milliseconds (0.6%)
total binding fixups: 261,576
total binding fixups time: 94.93 milliseconds (2.1%)
total weak binding fixups time: 0.33 milliseconds (0.0%)
total bindings lazily fixed up: 0 of 0
total initializer time: 107.66 milliseconds (2.3%)
    libSystem.B.dylib : 72.95 milliseconds (1.6%)
libBacktraceRecording.dylib : 0.58 milliseconds (0.0%)
       libc++.1.dylib : 0.05 milliseconds (0.0%)
      libobjc.A.dylib : 0.03 milliseconds (0.0%)
       CoreFoundation : 0.83 milliseconds (0.0%)
               vImage : 0.01 milliseconds (0.0%)
     libGLImage.dylib : 0.14 milliseconds (0.0%)
libFosl_dynamic.dylib : 0.02 milliseconds (0.0%)
            CoreImage : 0.01 milliseconds (0.0%)
        CoreTelephony : 0.01 milliseconds (0.0%)
     HockeySDK_Source : 0.54 milliseconds (0.0%)
                Artsy : 19.80 milliseconds (0.4%)
total symbol trie searches:    87387
total symbol table binary searches:    0
total images defining weak symbols:  15
total images using weak symbols:  46
alloy commented

The timing info can be seen by setting both the DYLD_PRINT_APIS and the DYLD_PRINT_STATISTICS env variables. You’d expect just the latter, but it only seems to get printed when combined with the former. The timing info will be in the middle of all the API usage.

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/dyld.1.html

@alloy I'm just curious because I recently reevaluated CocoaPods dynamic frameworks vs. statically linking all of them in one shared dynamic framework. Sry if this has already been answered or discussed anywhere else. Is this tested with or without an attached debugger? My Podfile looks like this

source "git@github.com:Sparrow-Labs/Specs.git"
source "https://github.com/CocoaPods/Specs.git"

use_frameworks!

def framework_dependencies
    pod "AFNetworking", "2.5.3"
    pod "MBProgressHUD", "~> 0.9.1"
    pod "SGKeychain", "~> 1.1.0"
    pod "KissXML", "~> 5.0.0.2"
    pod "CCVFly", "~> 0.3.3"
    pod "SWFSemanticVersion", "~> 0.3.1"

    pod "SHAlertViewBlocks", "~> 1.2.0"
    pod "SHBarButtonItemBlocks", "~> 1.2.0"
    pod "SHControlBlocks", "~> 1.2.0"
    pod "SHControlBlocks", "~> 1.2.0"
    pod "SHKeyValueObserverBlocks", "~> 2.0.0"

    pod "CloudBridge", "~> 1.2.0"
    pod "CBRRESTConnection", "~> 1.2.0"
    pod "CBRRelationshipResolver", "~> 1.0.0"

    pod "CTOpenSSLWrapper", "~> 1.3.0"
    pod "Flow", "~> 1.6.1"
    pod "SPLBadgeView", "~> 1.0.0"
    pod "SLObjectiveCRuntimeAdditions", "~> 1.1.0"
    pod "SLScrollViewKeyboardSupport", "~> 1.1.0"
    pod "SPLSpeechBubblePopoverController", "~> 2.1.3"
    pod "SPLTextFieldChain", "~> 1.0.0"
    pod "SPLTransparentDropShadowView", "~> 1.0.0"
    pod "SPLFoundation", "~> 1.4.2"
    pod "SPLMobilePrinting", "~> 2.4.4"
    pod "SPLOptionsViewController", "~> 1.1.2"
    pod "SPLStickyHeaderTableViewController", "~> 0.1.3"
    pod "SPLFormular", "~> 0.8.1"

    pod "MultipeerCommunication", "~> 0.4.0"
    pod "MCSwipeTableViewCell", "~> 2.1.2"
end


target "CocoaPodsDemo" do
    framework_dependencies

    pod "pop", "~> 1.0.0", :inhibit_warnings => true
    pod "Shimmer", "~> 1.0.2"
    pod "CDZQRScanningViewController", "~> 1.0.7.3"

    pod "UITextField+SPLAutoCompletion", "~> 1.0.0"
    pod "UIAlertControllerConvenienceAPI", "~> 0.2.0"

    pod "SPLPing", "~> 1.1.1"
    pod "SPLWindow", "~> 1.3.1"
    pod "SPLInlayLabel", "~> 1.1.0"
    pod "SPLMobilePayment", "~> 1.2.0"
    pod "SPLRoundImageCache", "~> 1.0.0"
    pod "SPLPagesViewController", "~> 1.0.0"
    pod "SPLTiltedTabsViewController", "~> 1.0.0"
    pod "SPLZoomingViewControllerTransition", "~> 1.1.2"
    pod "SPLBlurredBackgroundViewController", "~> 1.0.0"
    pod "SPLTableViewRowReorderGestureRecognizer", "~> 1.0.0"
end

target "DynamicFramework" do
    framework_dependencies
end

which results in 52 CocoaPods targets. Launch time on my end with a basic demo app with debugger is ~10s. Without debugger <1s which is reasonably fast enough for my use case.

orta commented

I noticed this initially on a device beta which doesn't have a debugger attached, though I think it's definitely worth a look.

@alloy, got the "total images loading time” down from 2.5 seconds to 0.9 seconds by statically linking all frameworks’ object flles into the app. Add the following as a "run script" before the app is linked:

ls -t $OBJROOT/Pods.build/$CONFIGURATION$EFFECTIVE_PLATFORM_NAME/Pods-*.build/Objects-normal/$CURRENT_ARCH/*.o | egrep -v '/(_?FBSDK|BF|BIT)|Dummy' >$OBJROOT/Pods.build/$CONFIGURATION$EFFECTIVE_PLATFORM_NAME.filelist

and replace all -framework arguments in the .xcconfig’s OTHER_LDFLAGS with -filelist $OBJROOT/Pods.build/$CONFIGURATION$EFFECTIVE_PLATFORM_NAME.filelist

This leaves one pod "Artsy+OSSUIFonts” which uses [NSBundle bundleForClass:self] to locate resources in the pod which can be added to the main project now that ARFontLoader is in the main bundle.

alloy commented

@OliverLetterer I think that might actually be it. It’s fast without a debugger, so possibly @orta was running into another issue and it appeared to me to be this. Not sure, will have to take another look tomorrow when I’m sharper.

alloy commented

@johnno1962 That’s an impressive command! But I wasn’t looking for static linking, we can do that simply by omitting use_frameworks! from the Podfile, the point was more about finding a solution for dynamic linking.

Oops, shows how much I know... Works for Swift micro-frameworks though so could be handy for inlining frameworks in future.

alloy commented

@johnno1962 Good point! And thanks for taking the time anyways, there can never be enough knowledge sharing ;)

alloy commented

Just an update, the performance has been getting better over the betas, and without a debugger attached the impact is only really noticeable the first time an app is launched. Still something that I’d like to improve, but not a blocker for us anymore.

We're currently investigating this as well.

If the system runs low on memory, it can purge the dynamic frameworks out of the shared cache, so linking can potentially take a long time more often than just the first time the app is launched.

@alloy Did you have any possible approaches in mind I could look into, other than prebinding?

I'm looking into lazy loading, but so far all documentation seems to say that symbols are loaded lazily by default unless you use -bind_at_load when generating the library. Another possible cause could using -load_all in the executable, but we don't do either.

orta commented

Could CocoaPods be sneaking in a load_all?

CP itself doesn't do it, but there are tons of Pods which specify -all_load

alloy commented

@stepanhruda Nope, I would probably go down the same avenue you are currently going. The point about the shared cache being purged is a very good one and might have been a thing why I noticed it less at some point, i.e. purely because I had more free memory. Please keep me posted, as this probably means I’d have to look at it soon again as well.

I did run a git grep load_all git grep all_load, but didn't see it actually being passed to the compiler anywhere (although documentation for some Pods mentions it). I wonder if -ObjC could be causing this, although there's no way we could drop that since it loads Objective-C categories.

I tried using DYLD_PRINT_BINDINGS, which could tell me which symbols are actually bound lazily, but that takes forever since it literally prints every single symbol the executable is linked against.

I'm going to do debug dyld to see what the time breakdown for loading all library images looks like.

Prebinding seems to be abandoned since OS X 10.4, so I'm not going to spend time on it, it seems that it did not work for @alloy above anyway.

orta commented

Same FWIW:

~/d/i/a/eigen (master) ⏛  grep -r all_load .
./Pods/Expecta/README.md:6. Add `-ObjC` and `-all_load` to the **Other Linker Flags** build setting for the test target in your Xcode project.
./Pods/Specta/README.md:6. Add `-ObjC` and `-all_load` to the "Other Linker Flags" build setting for the test target in your Xcode project.
alloy commented

@stepanhruda It’s probably a good idea to first verify that lazy loading is actually on by default, on iOS with Objective-C class+categories containing libs, by creating a clean slate project and manually adding a few frameworks.

(So no CP.)

Also, you guys have this tagged iOS 9, but I'm currently compiling against iOS 8 running on an iOS 8 device and seeing long linking times, so I don't think it's necessarily related.

alloy commented

@stepanhruda That’s true, it was more of a conflation with our frameworks+iOS 9 sprint. I’ll remove it to avoid further confusion.

I wonder if dyld isn't in fact what's taking that long? Here's a console output:

Wed Sep 23 10:44:25 EDT 2015
_dyld_register_func_for_add_image(0x399acda1)
Wed Sep 23 10:44:25 EDT 2015
_dyld_register_func_for_add_image(0x39939c95)
_dyld_get_image_slide(0x2c000)
_dyld_register_func_for_remove_image(0x39939ce1)
_dyld_register_func_for_remove_image(0x39392dd9)
dyld_register_image_state_change_handler(40, 1, 0x39392c51)
dyld_image_path_containing_address(0x24c1000)
Wed Sep 23 10:44:26 EDT 2015
dlopen(/usr/lib/system/introspection/libdispatch.dylib, 0x00000010)
  dlopen(/usr/lib/system/introspection/libdispatch.dylib) ==> 0x1fec93a4

# Omitted, a ton of other images linked here

Wed Sep 23 10:44:29 EDT 2015
dlopen(/System/Library/Frameworks/UIKit.framework/UIKit, 0x00000010)
  dlopen(/System/Library/Frameworks/UIKit.framework/UIKit) ==> 0x1fec7224
dlsym(0x1fec7224, UIApplicationDidEnterBackgroundNotification)
dlsym(0x1fec7224, UIApplicationWillEnterForegroundNotification)
total time: 20.2 seconds (100.0%)
total images loaded:  299 (231 from dyld shared cache)
total segments mapped: 201, into 6675 pages with 532 pages pre-fetched
total images loading time: 14.6 seconds (71.9%)

Although the reported time is 20.2 seconds, the wall clock (provided by shell's date) seems to have only advanced by 4 seconds since the first dyld function call.

Diving into dyld to see what total images loading time actually measures:

uint64_t t0 = mach_absolute_time();
this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths);
context.notifyBatch(dyld_image_state_dependents_mapped);

// we only do the loading step for preflights
if ( preflightOnly )
    return;

uint64_t t1 = mach_absolute_time();

// [...]

fgTotalLoadLibrariesTime += t1 - t0;

// [...]

printTime("total images loading time", fgTotalLoadLibrariesTime, totalTime);

I think the slowness is happening inside ImageLoader::recursiveLoadLibraries, but I can't seem to hit any symbols to profile this method further 😢

Right now I'm thinking the library images get loaded with the image of the executable, before Xcode attaches lldb. Perhaps I'd need to jailbreak the device and do even deeper hackery to profile this.

orta commented

Some notes from @samdmarshall

I'd file a new radar on it -- dyld will have to allocate the same amount of memory for paging regardless of frameworks or you linking it all statically

so it seems like a problem with the page allocation not being performant

on iOS i don't know if you can without jailbreaking

you can look at the imp of dyld on opensource.apple.com

the only additional difference would be that you need to open more file descriptors with frameworks that if you were to statically link

so your limitations are file IO speed to open then map the file to a memory page (and allocation of said page) happening repeatedly rather than once

which might be a problem with dyld in general

so the system resources: file IO to access the file, allocate the page, load it to the page. then the software time to map the binary, and repeat the process of resolving dependencies and then doing symbol mapping for that in the global symbol table

orta commented

And just to ensure all context is in one place, @marcelofabri filed a DTS on it this here - artsy/eidolon#491 (comment)

alloy commented

@stepanhruda Can you possibly paste a brain dump of how you arrived at the idea of where the problem occurs?

I’m gonna try to make some time next week to dive into this issue again as well.

My line of thought right now:

  1. Based on DYLD_PRINT_STATISTICS, we know that total images loading time is what hangs for so long. In everyone's provided outputs, it covers 70-90% of dyld's reported running time.
  2. Looking through dyld's source, total images loading time simply prints the value of fgTotalLoadLibrariesTime.
  3. fgTotalLoadLibrariesTime measures only a handful of synchronous calls, by far the most suspicious being ImageLoader::recursiveLoadLibraries.
  4. In ImageLoader::recursiveLoadLibraries every library is loaded through context.loadLibrary, which I believe is a function pointer to load (my C++ skills are nonexistent), which looks up the frameworks in @rpath.
  5. Going back, ImageLoader::recursiveLoadLibraries does what the name suggests – it traverses the whole dependency graph and loads all library images. I'm not able to profile which part is slow, because all of this happens before lldb gets attached, but I have a couple of ideas what is slow:
  • if the traversal is generally efficient, disk access to a couple of dozen library images could be slow enough, as @orta & @samdmarshall suggests above
  • traversing the dependency graph can at worst be an O(N^2) operation – if it happens that there is an inefficiency (unnecessary disk access) or e.g. if checking whether an image is already loaded is expensive, the cost quickly grows out of proportions

I also successfully copied symbols into the app executable as @johnno1962 suggested above, which cut the load time down by 400%. But I think he mentioned this didn't work for him in production builds in the end?

I gotta say I'm slightly out of ideas for how one could successfully ship an executable with tons of microframeworks.

I believe the images are checked if loaded by string comparison on the install path of the library. I would want to investigate if the repeated page creation was the expensive part, or if the libraries being loaded were not aligned with page size and this incurring a higher cost.

As mentioned to alloy and orta earlier, I would want to investigate if this is a platform problem (page allocation and perf) or a dyld problem (symbol and dependency resolution) before trying to dive into the incomplete source code provided by Apple further.

As a suggestion, I did toy with the idea of using static linking along with reording the classes of a project to be the order in which they are used (determined by producing an old style profile file by swizzling +alloc). Flash drives are random access anyway so this wouldn’t be a big gain unless you have an app with facebook scale but paging would at least be more orderly. Question is how big is a class relative to page size. If Facebook is 13000 classes in 100 meg you can work it out...

@johnno1962 I'm not sure I understand. I did succeed at moving the symbols into the final executable as you suggest in Accelerator. I thought this didn't work for you because in the end dyld decided to load the library images anyway. Is there another issue?

Other than that, I did see a massive drop in image load time after I copied all the symbols over before building the final binary.

It’s Good news for Accelerator you did manage to observe a drop in load times statically linking! A couple of people observed in artsy/eidolon#491 (comment) that this was likely related to the debugger loading symbols more than anything and less of an issue outside the debugger. My comment above was making a further suggestion to reorder the classes in the project according to when they are used but I doubt this would make any difference unless you have a very big executable with very small classes.

Would love to help if anyone needs someone to investigate anything. We're experiencing similar issues with our app.

I am going to experiment with copying all symbols into the app executable today and report back.

alloy commented

@stepanhruda Btw, for better profiling, I was going to look at it by creating an empty application, embed all of your real frameworks but only link them dynamically at runtime. Then one should hopefully be able to profile the full loading experience.

Good news (hopefully, fingers crossed).

I managed to manually deploy a proof-of-concept enterprise build, using the hack @johnno1962 suggests in https://github.com/johnno1962/Accelerator, and it seems to resolve the immediate issue, although it for sure shouldn't be a long-term solution. Here is my next course of action:

  1. Integrate the hack into our automatic build pipeline, so we can test it in beta builds (of a private repo).
  2. If beta builds perform well, submit to iTunes Connect to verify apps doing this don't get rejected.
  3. Abstract the solution so other developers facing the same issue get unblocked. Right now my plan is to ship a cocoapods plugin.
  4. Investigate manual dyld image loading after launch as @alloy suggested above. File a radar.

Still waiting on my workaround to be approved or rejected by TestFlight review.

In the meantime I created an example project that makes the issue dead simple to reproduce and wrote a (hopefully) comprehensible summary: https://github.com/stepanhruda/dyld-image-loading-performance

Any feedback is appreciated, will be filing a radar soon.

orta commented

http://opensource.apple.com/release/ios-90/ <- new source for dyld might be in here

I was investigating the source through https://github.com/opensource-apple/dyld

(You can't run & debug dyld that iOS uses without jailbreak though.)

The github isn't reliably up to date with the latest sources. (also, dyld only gets published as part of OS X packages)

Our app with 62 dynamic frameworks (73 binaries when including libSwift* and the app itself) also suddenly takes 3-4 seconds to load on iOS 8 and 9. When looking for the cause we found this issue.

Here our observations:

  • Whenever we hit "pause" in the debugger during app start, we end up in ImageLoaderMachO::loadCodeSignature -> fcntl
  • Device log is spammed with entries like this: amfid[…] <Error>: SecTrustEvaluate [leaf IssuerCommonName SubjectCommonName] -> 73 times - surprise!
  • Subsequent app starts are a lot faster. Obviously something is cached. The device log also no longer shows the entries above on app start.
  • Few app starts later the cache seems to be purged. The app starts slow again, spams a lot in the device log and hangs in ImageLoaderMachO::loadCodeSignature most of the time.

So this looks a lot like an issue with Code Signing :(

alloy commented

@fluidsonic Oh, nice sleuthing! Are you going to do something like request a DTS by any chance?

@alloy what is a DTS? :)

I filed an additional radar 22948371.

This has to do with system integrity and verification of everything being run. You won't find that there is a way around this for complex projects. A thought I had the other day was to see if this problem persists if you are using tree based dependencies rather than a flat list. +20 libraries is a lot for the app to link directly, so if that was changed to be linked to reduce the number of dependencies per node of the overall graph if it would make this a bit better.

alloy commented

@fluidsonic It’s ‘Developer Technical Support’, which is something you only get to do so many times a year, so it’s nice to coordinate and not waste any :) https://developer.apple.com/support/technical/

Having said that, I just remembered that @marcelofabri has requested one, maybe he could follow-up on that one, as the answer he got did not cover any of this: artsy/eidolon#491 (comment)

@alloy ah, sure! Never needed them the past 6 years but this is a good case to request a ticket. I also think it makes sense to wait for a follow-up of @marcelofabri's ticket first.

orta commented

IMO - can't hurt to have more than one DTS. We should do one too.

Sent from my iPhone

On Oct 2, 2015, at 13:03, Marc Knaup notifications@github.com wrote:

@alloy ah, sure! Never needed them the past 6 years but this is a good case to request a ticket. I also think it makes sense to wait for a follow-up of @marcelofabri's ticket first.


Reply to this email directly or view it on GitHub.

rdar://22920190 and DTS filed from me 🔮

@fluidsonic if this is a code signing problem it could still be mitigated by consolidating frameworks into a single binary. Less framework linking - less codesign checks. Static linking would be improving lauch time, just not how it appeared at the first glance.

We finally got approved for TestFlight beta testing with the workaround, going to App Store soon.

I wrote a full description of the workaround – it's in README for https://github.com/stepanhruda/dyld-image-loading-performance.

Nice! As of yesterday I converted our app to use frameworks instead of static libraries and we are linking 8 frameworks (soon it should be 10 or 11) and i'm not noticing any issues with launch time at all. Have we determined any point where the load time really becomes noticeable? (I know that we have tested with 20+ frameworks being linked I'd figure we might want to determine where we start noticing it to file with the radar and help people work around this until a fix can be implemented)

@samdmarshall The example repo has 40+ frameworks and it is slow on an older device (iPad 2), you can test it out. It makes sense that 10 frameworks on an iPhone 6 or 6s will be barely noticeable.

Unfortunately, DTS didn't come up with a silver bullet.

Hello,
It’s great to get such detailed information like this, and its clear you have a good grasp on the framework loading process. Engineering loves when this information is provided in a bug report, which I see you’ve done.

My first email was focused around providing you with clues on how tow workaround this issue, which is why I was focused on initializer time, and ways to get around the app startup watchdog timer going off — these are the things that you can control and workaround, such as by managing your app’s dependencies and using dlopen( ) to load each framework only when it is needed by the app. Hopefully that means you can spread out the load time across the amount of time your app is used.

Beyond that, there is nothing more I can do for you at this time to make framework loading go faster. You can follow up on your bug report on bugreport.apple.com if you wish to know its status.

Great, that is so disappointing :( I think those load times are unacceptable, and a proof that they didn't test that behavior before shipping support for iOS dynamic frameworks...

I simply think this was a use case they did not plan to support, but it turned out to be one in the real world.

What was not supposed to be a real-world use case? Swift modules?!

Anyway are there any current workarounds which don't require rewriting a lot of code or break resource loading from bundles?

What affects the time of code signature verification? The size of the whole framework? The size of the dylib in it? Or is it nearly constant?

Before this otherwise productive and helpful thread devolves into more off-the-cuff comments I'd like to remind everyone that this may be out of the control of being fixed as it may be due to AMFI enforcement doing its job.

@stepanhruda I strongly recommend that you communicate your findings and other information from this thread in the radar you opened and proceed from there. If this gets stuck at any point please let me know and I will see what I can do to help.

Great, that is so disappointing :( I think those load times are unacceptable, and a proof that they didn't test that behavior before shipping support for iOS dynamic frameworks...

In hindsight I realize that my comment was ambiguous (and not very productive) so I wanted to apologize. When I said "they", I didn't mean the cocoapods folks, I meant Apple releasing iOS 8 with 3rd party dynamic frameworks support.

To be blunt, @JaviSoto your comments weren't called for regardless who it was meant for. As mentioned in this and related issues this seems to arise when working with an extraordinarily large number of dylibs that get embedded into the application bundle. This isn't a mater of "it wasn't tested", it is a matter of something else impacting the load process. As I have mentioned repeatedly now, it is likely that it may be AMFI or something else specific to iOS - otherwise this would have come up in the last 15 years we could do this on OS X.

This configuration of dependencies is highly unusual, flat rather than more of a tree, and this is definitely an edge case -- an edge case that is coming up repeatedly due to how CocoaPods works (again I'll note that we haven't had something like this on OS X). I would really appreciate it if we could get this back on track for collecting the data we need to properly report this to Apple. Thanks!

Hi @samdmarshall, thanks for the message.

I apologize again for my completely useless message. It came from absolute frustration. An app I'm working on has no more than 10 frameworks, and I'm also seeing >3s launch time slow-downs caused by this, and I hoped that the DTS would result in some workaround. I'll keep monitoring the ticket in case there's any debug information I can provide to further help with the issue.
Again, I apologize for my out-of-place message. Thanks for the work you guys are putting into investigating this issue <3

Our app (test version AND App Store version) was suddenly no longer launching at all on my iPhone 5s today. I tapped the app icon and for ~8 seconds nothing happened - not even the launch screen appeared. Then the launch screen appeared and some time later the app was killed for launching for too long.

In the device log I could see that the dylib loading time went up to ~8 seconds per dylib.

Then I noticed that I still had the Network Link Conditioner set to "100% Loss". After turning it off the launch time was back to 3-4 seconds as before. Turning it on again didn't cause the issue to happen again. Maybe something got cached now…

Bottom line is that the dylib loading and thus the app launch time is sometimes increased even further when there is a bad network connection.

@fluidsonic Maybe your specific dependencies are firing network requests in +initialize? I doubt there's anything network related in dyld.

I would find it highly unusual if dyld made ANY network requests at all. Seeing that where the slowdowns are occurring (being main() in the app is even called) are not from the app itself it seems unlikely to have any bearing on this. I believe your launch times, but I find it extremely unlikely that network link conditioner is responsible for slowing dyld down. Still if you could upload a sample that reproduced this issue I'd really like to dig into this.

I know how weird that sounds, but I've also thought about this before this from anecdotal evidence. When in very slow networks, launching an enterprise-signed app takes much longer. Subsequent fresh launches are quick, so like @fluidsonic mentioned, there seems to be some caching involved.
Something else that made me think that iOS makes some network requests when launching the app is that once I revoked an enterprise certificate, and an app previously signed with it immediately stopped launching. How else would Apple do that if they don't "call home" to validate the signature?

Certificate validation

The first time a user opens an app, the distribution certificate is validated by contacting Apple’s OCSP server. If the certificate has been revoked, the app won’t launch. Inability to contact or get a response from the OCSP server isn’t interpreted as a revocation. To verify the status, the device must be able to reach ocsp.apple.com. See Network configuration requirements.

The OCSP response is cached on the device for the period of time specified by the OCSP server— currently, between three and seven days. The validity of the certificate isn’t checked again until the device has restarted and the cached response has expired. If a revocation is received at that time, the app won’t launch.

Revoking a distribution certificate invalidates all of the apps you’ve signed with it. Revoke a certificate only as a last resort—that is, if you’re sure the private key is lost or you think the certificate has been compromised.

-- iOS 9 Technical Deployment Guide

Yes it does check to validate the certificate, but that should be part of AMFI or another related security system and not dyld itself. It does mean enterprise apps might need to be more careful to prevent the watchdog from killing the app for taking too long.

Why enterprise apps specifically?
Our app is a regular app from the App Store on a regular device which is not part of MDM.

Why doesn't Apple just check for revocation in parallel while the app already launches / runs?
Why do we have to wait for a network request to finish (or even fail!) just to launch an app?!
I'm fine with revocation checks, but they should be asynchronous esp. since a network failure "isn’t interpreted as a revocation" anyway.

@fluidsonic because enterprise provisioning profiles don't use UDIDs to associate the ability to run an app to the device itself. With the Developer program you have to provision you approved devices with your account, and these are listed as part of the signed provisioning profile that you use to install the app and allow it to be run on the device. Enterprise distribution doesn't do it this because it isn't feasible to register every device (hundreds, thousands) to a single profile to be evaluated when it gets installed. Instead they authenticate the certificate used is still valid (not expired or revoked) and the user's own approval of the profile (trusting it) on the device.

Validation checks cannot happen in parallel because it means you could be executing malicious code while the check is still running to see if the certificate is valid or not. This results in a minor annoyance if you have no network connectivity but device security and protecting the user (and their data) comes first.

@stepanhruda Any code execution in the app would happen after the launch screen appears. Since that doesn't even happen something is going on before the app's code (or any framework's code) is actually executed.

@samdmarshall:

I also don't expect that dyld does any network requests on its own. But we already know that dyld communicates synchronously with amfi for code signature verification so if amfi performs a synchronous network request dyld would hang too.

Again this is not about enterprise distribution. I don't use enterprise distribution yet I had this issue.

Revocation checks can happen in parallel. Right now as per the info you quoted when the revocation check is impossible right now (e.g. due to a network problem) then the app will still launch. Another check will happen later.
I don't see how a parallel check is much different from this.
Why should a 100% inaccessible network cause the app to launch successfully without any delay and a timeout due to 100% packet loss add 8sec load time per dylib and result in a kill?
Revocation checks are always asynchronous by design.

It's also troubling that the network request is done not once for the whole app but repeatedly for all dylibs. While this is no issue when the first request succeeds (due to caching) it causes the extra delay for loading each subsequent dylib because network failures aren't cached.

The result is not a minor annoyance but really slow app starts - up to being killed.
I'm pretty sure this can be designed to perform a lot better.

@fluidsonic There are some things you seem to be misunderstanding about the scenario you described:

  • The based on what you have said so far you are asserting that 100% Packet Loss == Inaccessible Network Access, which is not true. The device thinks it has network access on some interface, even though it is configured to be 100% packet loss this doesn't mean the network is inaccessible, it only means the requests aren't returning with a response from the destination. This is not the same as an inaccessible network, which would be no active network interface (airplane mode).
  • I am not making the assumption that this is an enterprise problem you are experiencing. I am only commenting to answer your questions related to the topic because of what Javi posted.
  • Revocation checks (in general) could be make in an asynchronous process as to not block. However these checks are being done on the signing certificate validation in order to run executable code. If the certificates are not valid, then under no circumstances should the code ever be loaded into memory at all. AMFI must validate all files for correct signatures before they can even be processed by dyld. If a signature fails then dyld cannot open a file descriptor to the library, let alone put it in memory.
  • As mentioned the process i described is what happens for enterprise certificates, if you are building with a developer (not enterprise) account, then this will be different. Apps from the store won't have this issue either.

I can assure you that the engineers at Apple that are responsible for this process are extremely good at what they do and are aware how what the needs and performance optimizations around this process. I would rather this thread not continue to deviate with more trivia about the code signature validation process unless it proves relevant to further investigation. I do invite you to file a radar about this as well and to share any diagnostic reports you might have about this issue to see if it is related otherwise you best bet would be ask on the apple dev forums.

@samdmarshall We mean the same. I don't mean inaccessible network and 100% packet loss to be the same. Just the result is the same: the request fails. But in the former case the app starts without delay while in the latter case the app was killed for launching too long.

You say that this issue is limited to enterprise certificates. I had the problem with an App Store app (plus three test apps installed through Xcode or Ad-hoc). No enterprise. We don't know whether OCSP is to blame here but the issue is not limited to enterprise app distribution.

We do misunderstand regarding signature verification. I'm not talking about performing signature verification asynchronously - that would indeed be insecure. I'm talking solely about the certificate revocation checks. A revoked certificate means that the app was executable earlier so it shouldn't matter if it runs a few milliseconds longer. Also since an inaccessible network is already a valid way to bypass revocation checks (at least for a while) then asynchronous revocation checks don't make much of a difference. They would just speed up the app start.

I agree that most Apple engineers are really good but that doesn't mean that everything is perfect.

This issue is about unnecessary slow app starts which are fact. In order to find workarounds to mitigate the issue we need to understand what's happening. I'm glad people are contributing their experience and background knowledge here and so far the code signature verification is most likely to blame.

We already filed a radar but since Apple's response times are usually between three months and two years we're looking for alternative solutions.

A tip (via @orta) for anyone that's able to use static libraries for most of their pods but needs one or more pods to be built as frameworks:

  • Delete use_frameworks! from your Podfile and remove the framework-required pods
  • Create a Podfile that includes the framework-required pods
  • Generate those frameworks using Rome
  • Drag the frameworks to your target
  • Add the frameworks in a copy file phase to your target with destination type Frameworks
  • Optionally thin the fat files with some run script

Doing this I managed to get down start times on an iPhone 5s with about 40 pods from ~20 seconds to about ~15 seconds. I was a bit surprised it didn't shave off more load time, but our Swift pod contains 12 frameworks. And simply loading the swift .dylibs takes several seconds on a clean restarted iPhone 5s. 😢

Hopefully this will help someone while we all wait for a better solution.

The workaround I talked about above has now officially passed App Store review!

orta commented

Related:

https://twitter.com/ashfurrow/status/656947776736096256

@jckarter Hey there! Not sure if this is in your wheelhouse, but we’re having trouble with large # of frameworks: https://t.co/FwZe697W2N

https://twitter.com/jckarter/status/656948781271810049

@ashfurrow We've had other similar reports. Can you statically link your frameworks for the time being?

https://twitter.com/NachoSoto/status/656950568821260288

@jckarter @ashfurrow it'd be awesome if Swift modules can be statically linked at some point 😍

https://twitter.com/jckarter/status/656950742645739520

@ashfurrow It should be OK to link Swift libs together statically, as long as libSwiftCore and friends aren't.

https://twitter.com/jckarter/status/656953215582232576

@NachoSoto @ashfurrow I don't speak xcodebuild, but you should be able to link against a .o built by swiftc -c instead swiftc -emit-library.

alloy commented

Great to hear @stepanhruda!

So it seems that static linkage is going to be the only pragmatic solution for the foreseeable future (dyld optimisations, if any, would probably take a while to make its way to us) and has been proven to work/allowed.

@neonichu @samdmarshall I’m wondering if the process of building a static lib can be simplified and integrated into CP itself. E.g. Maybe override the build commands to invoke swiftc -emit-library instead, if that’s an exposed Xcode setting? Or otherwise have CP create an aggregate target that contains a script to perform the work that @johnno1962’s POC does?

@alloy I'd be a bit wary to integrate something that Apple states is officially not supported (static Swift libraries) into CP proper. However, a plugin doing this seems feasible.

alloy commented

@neonichu Fair enough, that’s probably a better way to tackle the implementation thereof anyways.

I agree with @neonichu, this shouldn't be made part of CocoaPods. I'd hesitate to make it a plug-in also, given it could leave a lot of code in a bad place (seems irresponsible to me).

  • Due to having no guaranteed stable ABI, it seems extremely unwise to even think of using static libraries for swift.
  • While technically possible to build swift static libraries by having swiftc output the object files rather than a binary, due to the ABI fragility I would recommend against any action that would result in people using this to ship pre-built libraries.
  • As mentioned in the tweets, and through my own interactions with the build system interface for swiftc, doing any such compilation would be best done outside of Xcode's normal target system. You could do this with a make file and an external build system target. This would allow us to our build with our own flags and then package up a static library with libtool like you would normally.

As I understand this conversation, you are suggesting only use these swift static libraries as a build artifact and not as a target to be built separately (I don't think you mean to support swift static libraries as a feature). That could work, though as I mentioned above I would really recommend against that due to the issues that would arise in the future from people treating it as a means to solve some archaic problem nobody fully understands.

alloy commented

I'm not suggesting this as a solution for prebuilt binaries, only for builds from source as is normally preferred with CP.

alloy commented

I hear you on the possibility for issues due to lack of understanding, which is probably another reason to prefer a plugin.

It sounds like we are on the same page about this then 👍

alloy commented

As an FYI to whomever ends up implementing such a plugin, if anyone at all, another issue with static linkage that we came up with is resources. Static linkage would bring us back to the state before frameworks where all resources basically belong to the same bundle (the main bundle). Since this plugin would be intended only as a workaround, until Apple either improves loading time or responds with a yay/nay, code that uses +[NSBundle bundleForClass:] should probably still be supported.

A possible hack would be to swizzle bundleForClass:, return a proxy that only overrides the resources related methods, and delegates the rest to the real bundle object. I don’t foresee problems, but as with any such hack, caution and proper testing of one’s app is required. The plugin should come with a very big fat warning note anyways, so that’s not anything new because of the addition of such a hack.

alloy commented

(Btw, I’ll be frank, I don’t use Swift and have no immediate need for any of this, so I am personally not going to work on this any time soon. Please step up if you do want to dedicate some time into this.)

FWIW, if a static library can be created with swiftc, then it should still be possible to bundle the binary in a framework and get the resource loading benefits. You can do this today with Objective-C frameworks by changing the Mach-O binary type from dynamic (the default) to static.

alloy commented

@sharplet The issue is that fake frameworks (those with static binaries) cannot use bundleForClass: and expect that to return the framework bundle, as the class will be defined in the main app binary.

@alloy Yep, I follow. Ah well.

Theoretically, would this work?

  • compile all Swift files to .o files
  • link all .o files into a single, real, dynamic framework

We're still loading a dynamic framework, but it's only one. We'd risk resource collisions in the framework, but they're detectable at least, and I'd personally take that (catchable) risk over adding seconds to the app load time.

I believe the individual libraries could be compiled to static .as as an intermediary if we wanted to and then be linked into a single framework alongside a single dummy class. Not sure which is easier.

@efirestone AFAIK, one framework implies one module, so this wouldn't be possible without changing the source code because certain import statements wouldn't apply anymore.

Also, there can be name clashes between files, duplicated symbols, etc.

@efirestone That's basically what I'm doing, except I don't even create that single framework, I just link all of the symbols into the app binary.

orta commented

@stepanhruda did you automate this? ( not trying to pressure )

Depends on how you define automate. If you check the sample repo there are about 5 steps to take to work around the issue. You need to add a post_install step to Podfile, add a run script build phase and link resources manually, and possibly weakly link an aggregate target. It's not as easy as using a single cocoapods plugin, but at least no changes get reverted after you run pod install.

orta commented

👍

@stepanhruda have you gotten an update on the radar for this yet?

No response so far

To keep anyone subscribed posted – last week Apple engineers requested a build reproducing the issue without explicitly calling dlopen. I submitted that, it's available as v0.2 of the example repo.

Trying to understand the problem a little more: Is this purely an issue with using frameworks through CocoaPods, or would someone run into this with any sort of dependency management system (or even just linking in dozens of frameworks manually)?

alloy commented

@sethfri It has nothing to do with CocoaPods, it’s purely code-signed frameworks.