metrics regexp usage for progress can be optimized

Question

metrics regexp usage for progress can be optimized

Closed this issue 2 months ago · 3 comments

Currently https://github.com/docker/buildx/blob/master/util/progress/metricwriter.go uses heavy regexp functions a lot.

This show both in inittrace, and with Find/Match functions when retrieving progress. It is present independently if tracing is enabled or not.

init github.com/docker/buildx/util/progress @30 ms, 0.077 ms clock, 41992 bytes, 224 allocs

For init we should avoid regexp.Compile in the init functions as it allocates lots of memory and blocks rest of the loading. While individually this isn't that significant, these bad practices by packages add up. This can be solved by adding sync.Once so the regexp instances are created one time on first access.

For the matchers, it looks like all the regexp always try to match against all of the vertex records coming in progress. This should only need to happen once per vertex. If vertex is type=exec it can not be type=source at the same time and does not need to be repeatedly rechecked. If a type for a specific digest has already been found, it does not need to be rechecked again when new logs are arriving.

I think the sync.Once can cover all the regexp together that then resolves the type info (and possible type specific values parsed from name) that is then memorized by digest.

While this would also be faster by doing a simpler string comparison in most cases, I don't think that is a good idea as the code becomes more error-prone. Instead, we should finish up returning type as a separate field in progress in buildkit side as was discussed in the original proposal for this. The buildx side of the code can already prepare for this (as it would need to keep regexp code for backwards compatibility anyway), so that there is a single function that determines the type based on string parsing that can then be skipped if the type is already set by buildkit.

@jsternberg

Answer 1 · 2024-07-19T10:10:24.000Z

could we use moby/moby#48166 or similar?

Answer 2 · 2024-07-19T16:07:51.000Z

That code for the lazy regular expression is pretty good. We could either just copy the one function or we could ask them to include it in a non-internal package, but it's probably easier to just copy it for now.

Answer 3 · 2024-07-19T16:38:14.000Z

There are a couple of other regexp that could be updated with lazy method, but for this probably a manual sync.Once makes more sense as we would need only one once for all the regexp instead of one per each. Bigger change probably is on the matching side.