bazel-ios/rules_ios

Prototype performance of remote compiled clang framework modules

jerrymarino opened this issue · 5 comments

We'd like a prototype to demonstrate the performance implications and identify unknowns of IDE integration issues when compiling framework modules with Bazel.

Consider how these are loaded into clang and swift, how to cache them, and how it works with the IDE ( lldb, etc ). Currently a remote compilation action pays the worst case cost O(NDeps) implicit module compilation and is riddled with other cache invalidation issues. If compiling them with Bazel, we'll be able to reliably cache / load state onto workers.

Well it looks like explicit module compilation support has recently been accelerating in swift-driver - the rewrite of the original C++ driver baked into the swift compiler - it seems like that might help validate some of this end to end - or even be a nod of how to do it in a supported way.

The other pontentially helpful bit is the C++ module compilation action inside of Bazel. What has been unclear how to pull the actual compilation off in starlark: e.g. if we'll need header pruning for this feature, or how we'd invoke compilation with cc_common 🤔

For system modules - it looks like the bit you'll need to do is now open sourced - and easy to digest in part of buck ❤️

Yesterday I looked into this a bit more and trying to reason about how to fit this into Bazel's paradigms of loading, analysis, and execution phases: both from an Xcode side and intra-build side. It looks like rules_swift has the "precompile module" clang invocation mostly there - so my pseudo code was focused on the Xcode side

If we decide to use starlark to define build system actions based on external input: e.g. given Xcode, this part needs probably goes in a workspace rule. Mainly, the requirement here is it to read the module dependency graph for a given Xcode and SDK, then generate rules to compile PCMs at the execution phase. If we need to select Xcode PCMs in the analysis phase ( I assumed we did here ) you can return a provider that maps a given module name to a PCM.

Workspace rule implementation

# rules/pcm_repo.bzl
def _find_modules_for_sdk(xcode_path, sdk):

    # Given an Xcode version, path, and SDK, write a single-threaded-fast: e.g. C++ or swift program that:
    # - find sub paths of the SDK module
    # - determine dependencies 
    # - returns a parseable format to loop later
    
    # compile a fast program e.g. C++, swift program for host here
    # run program and collect standard output
    
    # Transform the return value from standard output to an array structs with the *relative* SDK path - read in from parseable output
    return [
         # example return value entry
        struct(
         name="UIKit",
         # TBD: what does rules_swift bring to the table on this:
         modulemap="__BAZEL_XCODE_SDKROOT__/UIKit.framework/Modules/module.modulemap",
         deps = ["Foundation"])
    ]

def _impl(repo_ctx):
    ### Generates all pcm compile rules - and a hash table that maps a "module_name" -> pcm `Artifact`
    ### to query during analysis phase later
    modules = _find_modules_for_sdk(get_xcode_path() repo_ctx.attr.sdk)

    # This is the hash table allowing analysis-phase lookup
    provider_data = {}
    
    for m in modules:
        # Writes a BUILD file for the future actions
        provider_data = {m.name : ":" + m.name"}
        build_file += """
        # This is another Bazel rule that can compile SDK modules
        precompile_sys_module(
           name=m.name,
           deps = m.deps, # deps
           modulemap = m.modulemap,
           is_system=True
        """
    
    ## We probably need a hash table of modules to the module files 
    build_file +="""
    sys_pcms(name="pcms", pcm_data=provider_data)
    """
     # write a BUILD file 

# Define a PCM repo - a BUILD file generated for a given Xcode version
# I choose to make a per SDK repo rule - because I am biased to let Bazel parallelize
# a simple single threaded C++ searching program to write 1 BUILD file per SDK
# in a given Xcode
pcm_repo = repo_rule(attr={
    # You'll need this Xcode config to know what version and path to read
    # modules
    "_xcode_config": attr.label(default = configuration_field())
    "sdk": attr.string() # ios,iPhoneSimulator,macOS,watchOS,tvos
});

Next I'd declare the PCM repos e.g. - "pcm_repo" into the WORKSPACE

## WORKSPACE - 

#  1 PCM repo for each sdk
pcm_repo(name="local_configure_pcm_iOS", sdk="iPhoneOS")
pcm_repo(name="local_configure_pcm_iOSSim", sdk="iPhoneSimulator")
pcm_repo(name="local_configure_pcm_macOS", sdk="macOS")

Selecting a SDK module during analysis phase

Next I thought to write a rule to select a set of PCMS - during the analysis phase for a given set of sdk_modules - introspecting the current platform. This might have improvements available depending on what Bazel version you use

def _pcm_deps_impl(ctx):
    pcm_deps = []

    # Determine if it's a simulator by the device or something
    arch = ctx.fragments.apple.single_arch_cpu
    platform = str(ctx.fragments.apple.single_arch_platform.platform_type)
    is_sim = platform == "ios" not ctx.fragments.apple.single_arch_platform.is_device

    # PCM provider is a hash table to the sdk_module files - so our rules can determine what to add in the analysis phase - 
    # e.g. if this required by platforms as depicted above by introspecting the transition 
    pcm_provider = None
    if is_sim:
        pcm_provider = ctx.attr.pcm_repo_ios_sim[PCMProvider]
    else:
        pcm_provider = ctx.attr.pcm_repo_ios[PCMProvider]
     
    # Select a PCM for an SDK module - causing it to be an input to headers
    for sdk_module in ctx.attr.framework_sdks:
        pcm_deps += pcm_provider.pcms[sdk_module]

    # We feed all the pcm_deps to clang via cc_common.cc_compile and  swiftc 
    # swift_common.compile - in other layers of the build system
    return [CCInfo(headers=pcm_deps)


pcm_deps = rule(
    attr = {
        # sdk_frameworks for a given apple_framework, swift_library, etc
        "sdk_frameworks": attr.string_list(),


        "pcm_repo_ios": attr.label(default="@local_config_pcm_iOS//:pcms"),
        "pcm_repo_ios_sim": attr.label(default="@local_config_pcm_iOSSim//:pcms"),
        "pcm_repo_macos": attr.label(default="@local_config_pcm_macOS//:pcms"),
        # If you don't use one then set it to `None` to prevent burning resources
        "pcm_repo_tvos": attr.label(default=None),
        ])
    }
)

rules_ios usage

Finally, I'd add this into a BUILD file. This in-practice would go inside rules_ios macros

# BUILD file and rule consumption. For each `sdk_framework` - we'll need to
# add a module dependency. You'd need to handle transitive deps too
pcm_deps(name="myfw_pcm_deps", framework_sdks = ["UIKit"])
apple_framework(deps = ["MyDeps"] + [":myfw_pcm_deps"], framework_sdks=["UIKit"])

BUILD runtime complexity and memory usage

  • search Xcode concurrently - O(N) SDKs frameworks + swift provided module dependencies
  • maps a given apple_framework's sdk_frameworks -> PCMs O(N sdk_module) time
  • the hash table for analysis phase lookups is small e.g. O(N) total frameworks * N(SDKs) - where N(SDKs) is determined by the ones you give it

Conclusion

These are some of the problems I had thought of - and the implementation might not be word for word perfect, but in a t-shirt sized aspect of how this could work from the Xcode side is here IMO. Would love to be educated on any of this

I'd also verify if's possible to ditch the hash table lookups: e.g. merging some of the ideas about pcm_deps into the first part 🤔

You'd likely need to give it an Xcode path / version - e.g. via env var. Here's some caveats and fixes to related issues. I haven't confirmed if it this will work end to end arbitrary Xcode locations - without further patching of swift/LLVM to make this relative:

def _pcm_repo_impl(ctx):
    ## Does heavy lifting
    ctx.file("BUILD", content="""
# Gen the PCM BUILD file here

# TODO: or in a similar rule, you can configure Xcode - for the repo
# e.g. define xcode providers - and fail at BUILD time if the
# user gives it a different CLI variable.
    """, executable=True, legacy_utf8=True)

    ctx.execute(["mkdir -p external/" + ctx.name],  environment={}, quiet=True, working_directory="")

pcm_repo = repository_rule(
    implementation = _pcm_repo_impl,
    local = False, # Not local - only run when the path changes - you might feed a version too.
    attrs = {
        "xcode_path": attr.string(default = ""),
        "sdk": attr.string(default = ""), # Per SDK pcm repo
    },
)

Then in the WORKSPACE:

# Path to fix ( https://github.com/bazelbuild/bazel/pull/14328 ) - sans launchd
# The environment convention ( Xcode path ) is potentially necessitated by explicit PCM + RBE for other reasons. 
pcm_repo(name="local_configure_pcm_MagicSDK", xcode_path="/Applications/Xcode-13.3.1.app/Contents/Developer")

More complete/refined version - with launchd lookup, overrides:

# If the end to end V1 of this works with arbitrary Xcode locations for a version - allow searching
xcode_locator(name="xcode_locator", version="13.3.1")
pcm_repo(name="local_configure_pcm_withLaunchServicesProvidedXcode", xcode_path="@xcode_locator//:default")
def _xcode_locator_impl(ctx):
    # Does heavy lifting
    # Call xcode-locator if you want to search with launch services.
    ctx.execute(["mkdir -p external/" + ctx.name], environment={}, quiet=True, working_directory="")

    ctx.file("BUILD", content='default=/path/to/xcode-locator's result')

xcode_locator = repository_rule(
    implementation = _xcode_locator_impl,
    local = True,
    attrs = {
        # Need this because you can't read it from the CLI?? - fact check this
        "version": attr.string(default = ""),
    },
    environment = {} # TODO - we'd need to override this from the current Xcode.
)