dotnet/msbuild

Unable to have static-graph + isolated build + caching + // builds, working together

xoofx opened this issue · 4 comments

xoofx commented

Hey, this is a followup of #7110

So, I'm trying to create a build server that would be responsible to maintain the compilation of a "solution" or a group of C# projects as efficiently as possible by coupling:

  • static graph
  • isolated builds
  • caching of results
  • parallelized builds

I have also tried to play around ProjectCachePlugin to cover some of these aspects, but I'm hitting a restriction in the design of the caching that I'm not sure how to solve.

The main problem I have is parallelized builds in conjunction with the others.

Let's take an example. I have this project dependencies:

  • ProjRoot
    • LibA
      • LibLeaf
    • LibB
      • LibLeaf
    • LibC
      • LibLeaf

I would expect to issue a build + caching like this:

  • Build LibLeaf => store cache results LibLeaf.cache
  • Build LibA / LibB / LibC concurrently in isolate builds, with LibLeaf.cache input. Each build would produce LibA.cache, LibB.cache, LibC.cache
  • Build ProjRoot in isolate builds, don't cache the output, but use the cache from LibA.cache, LibB.cache, LibC.cache

The server would handle the caching state, would handle the life-sync with source on the disks (e.g like up-to-date-check of VS) with the ultimate benefits that builds could be much faster than even VS today because all the results are cached, so changing one project would not require to recompute the results of project dependencies...

From some early results from the prototype I did with the existing isolate caching, it can speed up the build on a single csproj by e.g x10 times faster. It's a lot. Extend that to an entire graph and it could be a game changer.

But I have hit the limitation that I initially didn't caught in the static-graph doc which is that isolate + cache can only happen in a BuildManager and the input and output cache is only setup-able per BeginBuild/EndBuild

It means that I can calculate all the above, only sequentially and single threaded, which is super limited.

So, instead, I have been trying to schedule the graph myself, by handling the scheduling similar to static-graph (so I copied the code here), I have added a way to serialize the results to disk (here) and thought that I could rely on project-cache through ProjectCachePlugin to load these serialized results.

Unfortunately, I discovered that ProjectCachePlugin are also only supported in the BuildManager scenario, which makes them useless in a parallel build.

I would have hoped that I could have issued builds by attaching the input/output to a request (instead of Begin/EndBuild), and that it could execute on msbuild Nodes instead.

buildManager.BeginBuild();

// Iterate on projects per group in // (as it is done in static-graph scheduling
// ....
loop on batch-able groups {
  loop on project on group {
     var request = new BuildRequestData(...);
     request.Inputs = ...;
     request.OuputCache = ....;
     buildManager.Execute(submission);
     var submission = buildManager.PendBuildRequest(request);
     submission.ScheduleAsync(...);
  }
}
buildManager.EndBuild();

The only solution I see is to build my own kind of build server nodes to do that, by hosting a BuildManager and performing my own input/output in these nodes... but ouch, that's a bit more work than expected...

Side notes: the current input/output caches load is super limited in a server by only providing file path loading, while I could also maintain a memory cache that could speed things further.

Thoughts?

cc: @rainersigwald

xoofx commented

I would have hoped that I could have issued builds by attaching the input/output to a request (instead of Begin/EndBuild), and that it could execute on msbuild Nodes instead.

So I have hacked (here) the input/output cache files per request instead of per BuildManager.Begin/EndBuild so that the caching can run on a node and It's working amazingly.

I'm able to compile an entire graph of 100 C# projects in 3s while VS/msbuild would take today a bit more than 7s to build it.

Would love to discuss with your team if we could bring such feature to msbuild.

That feels like a natural next step for the implementation. The initial design was driven by a requirement from the higher-order build system that the individual project builds have process-level isolation (for I/O tracking via Detours), but I don't think that needs to be a hard requirement for the overall system forever. My only concern is along the lines of "does this make it too easy to build a system that has underbuild problems because it doesn't fully understand inputs/outputs?", which I don't (at the moment) think is a great reason not to make the change.

xoofx commented

That feels like a natural next step for the implementation. The initial design was driven by a requirement from the higher-order build system that the individual project builds have process-level isolation (for I/O tracking via Detours), but I don't think that needs to be a hard requirement for the overall system forever.

I read that indeed - and to be honest I didn't fully understand why there was such a requirement. I also realized why they didn't need parallelizing in the nodes because likely they were parallelizing the BuildManager itself (which is not great imo)

My only concern is along the lines of "does this make it too easy to build a system that has underbuild problems because it doesn't fully understand inputs/outputs?", which I don't (at the moment) think is a great reason not to make the change.

Yeah, as always in such circumstances, with great power comes great responsibilities! 😅

xoofx commented

I have cleaned up my changes to msbuild and made a draft PR #7121 to open discussion.