hashicorp/go-slug

Avoid re-compiling rules

Opened this issue · 2 comments

I observed, while looking into #19, that due to the current implementation we're actually re-compiling every rule for every file. Updating the range in matchIgnoreRule to be by-reference improves performance on large repositories by a factor of ~3.

I independently noticed the same thing while working on ccbbeee and attempted to improve it, but since performace wasn't my main motivation in that commit I didn't spend much time studying the impact of the change I attempted there.

Looking back at it again with fresh eyes today I suspect I may not have actually fully addressed it: @skeggse's note about passing by reference instead of by value makes me notice that I seem to have made it just ineffectually precompile copies of the default rules, rather than the actual values that end up in the default ruleset:

var defaultExclusions = []rule{
{
val: strings.Join([]string{"**", ".git", "**"}, string(os.PathSeparator)),
excluded: false,
},
{
val: strings.Join([]string{"**", ".terraform", "**"}, string(os.PathSeparator)),
excluded: false,
},
{
val: strings.Join([]string{"**", ".terraform", "modules", "**"}, string(os.PathSeparator)),
excluded: true,
},
}
func init() {
// We'll precompile all of the default rules at initialization, so we
// don't need to recompile them every time we encounter a package that
// doesn't have any rules (the common case).
for _, r := range defaultExclusions {
err := r.compile()
if err != nil {
panic(fmt.Sprintf("invalid default rule %q: %s", r.val, err))
}
}
}

A small change that might make it effectual is to refer directly to the array elements by index, rather than working on the copies that the for ... range is implicitly making:

	for i := range defaultExclusions {
		err := defaultExclusions[i].compile()
		if err != nil {
			panic(fmt.Sprintf("invalid default rule %q: %s", r.val, err))
		}
	}

The default match object includes a slice value covering the same backing array, so I expect that a change like the above would make it effective, but I've not yet tested.

@brandonc i don't suppose you looked at this issue while you were working on the directory issue?