google/go-jsonnet

performance: reducing object allocation overhead via interning

Jesse-Cameron opened this issue · 0 comments

Hi friends,

I wanted to start a discussion about the performance of go-jsonnet. Specifically, looking at how long it takes for go-jsonnet to run bench.03.

Locally, when running this snippet in a go bench, it takes ~320ms and ~190MB to run. Which to me feels slower than it should be.


Investigation

I started doing some initial profiling using pprof. But there were no low hanging fruit that I felt could dramatically improve perf. And I landed on a theory that each time it creates a new function on the call stack, theres a really high allocation overhead?

Reading through this issue: #111. I wanted to experiment with the possibility of using string interning to reduce allocations. Using the go4.org/intern package I replaced the Identifier type. And updated references to it.

diff --git a/ast/ast.go b/ast/ast.go
index 90e970f..94b2a19 100644
--- a/ast/ast.go
+++ b/ast/ast.go
@@ -19,15 +19,25 @@ package ast

 import (
        "fmt"
+
+       "go4.org/intern"
 )

 // Identifier represents a variable / parameter / field name.
 // +gen set
-type Identifier string
+type Identifier *intern.Value

+func NewIdentifier(s string) Identifier {
+       return Identifier(intern.GetByString(s))
+}
+
+func GetString(i *intern.Value) string {
+       return i.Get().(string)
+}
+

You can check the whole changeset here. Getting the stdast dump to work with the interned strings was a bit of fun 😅 !

Unfortunately, the performance improvement from this change left me wanting more. Shaving a handful of ms and mb off of execution doesn't feel like it will be noticeable to users.

name   old time/op    new time/op    delta
_VM-8     316ms ± 0%     298ms ± 0%   ~     (p=1.000 n=1+1)

name   old alloc/op   new alloc/op   delta
_VM-8     186MB ± 0%     155MB ± 0%   ~     (p=1.000 n=1+1)

name   old allocs/op  new allocs/op  delta
_VM-8     3.64M ± 0%     3.64M ± 0%   ~     (p=1.000 n=1+1)

Discussion

If maintainers think it's worth it. I'm more than happy to clean the code above up and submit a PR. But I'm still questioning a few things:

  • Is there a more effective way to reduce allocations outside of interning?
  • Is adding string interning with the ROI?
  • Did I miss something obvious with my interning implementation that caps the perf gain to a certain amount?

Thanks for reading this far!~ Be keep to hear folks thoughts 😁 😁