Is it possible to reduce allocations when interning a byte slice?
marselester opened this issue · 5 comments
Hi, thank you for creating an awesome interning package!
Could you please advice whether it's possible to reduce allocations when interning a byte slice (a buffer that changes on every iteration and might contain duplicate strings)?
// InternString doesn't allocate.
func InternString(s string) string {
return intern.GetByString(s).Get().(string)
}
// InternBytes allocates.
func InternBytes(b []byte) string {
return intern.GetByString(string(b)).Get().(string)
}
// UnsafeInternBytes doesn't allocate but b mustn't be modified.
func UnsafeInternBytes(b []byte) string {
return intern.GetByString(unsafeString(b)).Get().(string)
}
func unsafeString(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
For example, when interning is implemented using map[string]string
, it would allocate only if a string wasn't found.
type Strings map[string]string
func (ss Strings) InternBytes(b []byte) string {
if original, isInterned := ss[string(b)]; isInterned {
return original
}
s := string(b)
ss[s] = s
return s
}
Not with this package, no. If you don't care about growing your map forever you can use a map, as you show. This package cares about letting the GC still do its job and reclaim things.
We would like to avoid growing the map indefinitely, but we would also like to avoid the allocation caused by converting the []byte
to a string
in the case where the string is already interned.
I had hoped that we could just add GetByStringBytes(b []byte) *Value
or similar to do this, but looking at the implementation of get
, I don't think that would work as I don't think Go would optimize out the string allocation of valMap[key{s: string(b), isString: true}]
the way it does with m[string(b)]
. Perhaps a fork that only handles string values would make sense.
https://github.com/josharian/intern might (might!) be a better fit for you
We would like to avoid growing the map indefinitely, but we would also like to avoid the allocation
These are hard to automatically satisfy at the same time. See also the discussion at https://commaok.xyz/post/intern-strings/
These are hard to automatically satisfy at the same time. See also the discussion at https://commaok.xyz/post/intern-strings/
Thanks for the link to the library and the article!
Looking at the implementation of github.com/go4org/intern
, I think it is possible to get the desired behavior with a fork that replaces the current key with a string key. I believe this would lead to somewhat more predictable interning behavior than the implementation in github.com/josharian/intern
, but it comes with the disadvantage of having to pass around *intern.Value
instead of the underlying string.
Perhaps something like golang/go#43615 or other changes to the runtime will make all of this more convenient in the future.