GitTools/GitVersion

[ISSUE]: GitVersion tool is slow / stalls

christophwille opened this issue · 14 comments

Prerequisites

  • I have written a descriptive issue title
  • I have searched issues to ensure it has not already been reported

GitVersion package

GitVersion.Tool

GitVersion version

6.0.2

Operating system

Linux

What are you seeing?

This is for

      git status
      dotnet gitversion /version
      dotnet gitversion $(Build.SourcesDirectory) /output buildserver /nofetch

on Azure DevOps pipelines (git status only added for testing and because some old issue mentioned that the index might be the culprit).

I am seeing a "pause" in the log - and it always happens for the current PRs branch (pull/2271/merge) End and the Begin of the actual branch I created comparing to develop (GitFlow). Below are two runs, as you can see it seems to wait for 3 minutes for no reason at all.

2024-08-13T07:24:15.4623771Z     INFO [24-08-13 7:24:15:46] -< End: Finding merge base between 'pull/2271/merge' and 'origin/release/4.77.0'. (Took: 0.43ms) >-
2024-08-13T07:24:15.4667782Z   INFO [24-08-13 7:24:15:46] -< End: Finding branches source of 'pull/2271/merge' (Took: 368.42ms) >-
2024-08-13T07:27:26.9638056Z   INFO [24-08-13 7:27:26:96] -< Begin: Finding branches source of 'feature/gitversionperftest' >-
2024-08-13T07:27:26.9666509Z     INFO [24-08-13 7:27:26:96] -< Begin: Finding merge base between 'feature/gitversionperftest' and 'develop'. >-

2024-08-13T07:36:02.1232132Z     INFO [24-08-13 7:36:02:12] -< End: Finding merge base between 'pull/2271/merge' and 'origin/release/4.77.0'. (Took: 0.40ms) >-
2024-08-13T07:36:02.1277016Z   INFO [24-08-13 7:36:02:12] -< End: Finding branches source of 'pull/2271/merge' (Took: 376.28ms) >-
2024-08-13T07:39:15.4620032Z   INFO [24-08-13 7:39:15:46] -< Begin: Finding branches source of 'feature/gitversionperftest' >-
2024-08-13T07:39:15.4648616Z     INFO [24-08-13 7:39:15:46] -< Begin: Finding merge base between 'feature/gitversionperftest' and 'develop'. >-

This also already happened for 6.0.0, I was testing 6.0.2 to see if there would be a difference.

I scrolled all "Took: " and I found one sore thumb:

2024-08-13T07:39:19.2643763Z INFO [24-08-13 7:39:19:26] -< End: Fetching the base versions for version calculation... (Took: 197,524.02ms) >-

However, that would never add up because no other Took: ever exceeded low 100s of ms, most of them were single digit.

INFO [24-08-13 7:36:00:97] -< Begin: Normalizing git directory for branch 'refs/pull/2271/merge' >-
  INFO [24-08-13 7:36:01:00] One remote found (origin -> 'https://someazdourlredacted/myrepo').
  INFO [24-08-13 7:36:01:00] Skipping fetching, if GitVersion does not calculate your version as expected you might need to allow fetching or use dynamic repositories
  INFO [24-08-13 7:36:01:01] Creating local branch pull/2271/merge pointing at d4d958b
and then a lot more Creating local branch

What is expected?

Of couse not seeing the pause.

Steps to Reproduce

Not sure if that reproduces outside of our project.

RepositoryFixture Test

No response

Output log or link to your CI build (if appropriate).

No response

I think it's a logging issue, we need to add more logs for the Finding merge base between

It'd be really nice to know where those three minutes went (maybe it is some out of whack branch or something)

I'm also having this issue as well, it's very strange.

I've been able to narrow this down to this method below. When I run this against my repository it runs for ~1.5 minutes as long as I'm on a branch off of main. I did a quick profile run and found the at the various Where methods are called about 7 million times.

var returnedBranches = new HashSet<IBranch>();
var referenceLookup = this.repository.Refs.ToLookup(r => r.TargetIdentifier);
var commitBranches = FindCommitBranchesBranchedFrom(branch, configuration, excludedBranches).ToHashSet();
var ignore = new HashSet<BranchCommit>();
foreach (var commitBranch in commitBranches)
{
foreach (var commit in branch.Commits.Where(element => element.When > commitBranch.Commit.When))
{
var parents = commit.Parents.ToArray();
if (parents.Length > 1 && parents.Any(element => element.Equals(commitBranch.Commit)))
{
ignore.Add(commitBranch);
}
}
}
foreach (var item in commitBranches.Skip(1).Reverse())
{
if (ignore.Contains(item)) continue;
foreach (var commitBranch in commitBranches)
{
if (item.Commit.Equals(commitBranch.Commit)) break;
foreach (var commit in commitBranch.Branch.Commits.Where(element => element.When >= item.Commit.When))
{
if (commit.Equals(item.Commit))
{
commitBranches.Remove(item);
}
}
}
}
foreach (var branchGrouping in commitBranches.GroupBy(element => element.Commit, element => element.Branch))
{
bool referenceMatchFound = false;
var referenceNames = referenceLookup[branchGrouping.Key.Sha].Select(element => element.Name).ToHashSet();
foreach (var item in branchGrouping)
{
if (referenceNames.Contains(item.Name))
{
if (returnedBranches.Add(item)) yield return item;
referenceMatchFound = true;
}
}
if (!referenceMatchFound)
{
foreach (var item in branchGrouping)
{
if (returnedBranches.Add(item)) yield return item;
}
}
}

I see the nested loops, I was able to get that number down to 15 seconds but broke some unit tests, so clearly I'm a little out of my depth with what this method needs to do.

Hopefully else can it a step further and figure out a good solution!

I have some proposed changes over at #4208 it appears the culprit is at least partially related to iterating over all the branch commits several different times.

Hi there. Thank you very much for narrowing down this performance problem. I see your change as an improvement and think it has no downsides. Might it be an idea to do the same change for:

  • BranchCollection
  • ReferenceCollection
  • RefSpecCollection
  • RemoteCollection
  • TagCollection

There is one detail we need to take in consideration, when the normalization happens we meed to refresh the cache of these collections

Indeed. Which is another argument for doing normalization as a separate command and step in the build.

Any sort of (interim) improvement is very welcome because it is getting worse for our builds by the day.

Same issues here. The Azure step is taking 11 minutes to run, should only take seconds!

There is one detail we need to take in consideration, when the normalization happens we meed to refresh the cache of these collections

May I ask you: The actual change of the CommitCollection is not effected by the normalization process and can be merged as a short-term hotfix to improve the performance, right?

I would say that we merge this change and do the other improvements in an extra ticket.

I think we better spent the time now to improve the lazy loading/ reset the cached collection, for all git object collections we have than just the one that is in this PR just for the sake of having this merged. So if the author is willing to continue the improving the git object collections loading, then we can help with that, otherwise I would rather improve it myself when I find some time.

So an interim QFE is really not an option? Even shaving off a few percent would be really highly appreciated as a stopgap until a nicely architected all-encompassing change is done and dusted.

The thing is we had previously this experience when there was something half-done and the original author of the changes was not willing to continue, and then we had to take it over and redo. So my preference is to get this done first and then merge. I get it that this small change can improve someone's specific case, but let's spend a bit more time and get it done