cmd/compile: optimize Write([]byte(stringVal)) to not copy the string
ianlancetaylor opened this issue · 11 comments
It would be nice to optimize Write([]byte(stringVal))
to not copy the string value. This is normally safe because most Write
methods do not modify the byte slice passed in. In fact, the documentation for io.Writer
requires that the Write
method not modify the byte slice, although this is not (and can not be) enforced.
Here is how we can do this.
For each function and method that takes a parameter of slice type, record whether the slice's underlying array is modified. This will require generating annotations similar to the ones we generate for escape analysis. Assembly functions will require an annotation similar to //go:noescape. (Naturally, a slice that escapes must be treated as modified.)
For each call of the form F([]byte(stringVal))
, where we know that F
does not modify the slice, we can pass the string value directly. This would do essentially the same thing as the existing optimization for map lookups in which a []byte
is converted to a string
. This fixes direct calls, but of course the interesting cases all involve calls to methods of values of type io.Writer
.
For any type with a Write
method that does modify the slice, generate at compile time an additional Write·2
method that makes a copy of the slice and then calls the Write
method. The method is named Write·2
to ensure that it does not conflict with any user written method.
When converting any type to io.Writer
(or any interface type that inherits from io.Writer
), check for the presence of a Write·2
method. If that method exists, add it to the interface as an additional entry in the itab.
For any call to an interface method Write([]byte(stringVal))
, modify the call to call a special function doWrite
in the io package, without copying the string. doWrite
will check for a Write·2
method, and call it if it exists; that will cause the string to be copied as happens today. If the Write·2
method does not exist, which will be the normal case, doWrite
will call the Write
method as usual, knowing that the Write
method does not modify the slice.
Generalizing this to other methods is left as an exercise for the reader.
I feel like there was a dup bug with the same plan but rejected because as I recall: nobody should be working with such large strings and stay in bytes if they want performance.
Was this motivated by something concrete?
There was no specific motivation; it was just an idea I had.
Your compiler annotation for Write
sounds a lot like the compiler secretly interpreting Write([]byte)
to be Write(immutable []byte)
; the same could be done for other pointer arguments that are provably unmodified.
Such an (internal) annotation would have additional benefits:
- Methods that do not modify their receiver won't require callers to re-load fields (I don't know how common this is or how much it would matter, but it's something)
- Functions/methods with only immutable arguments (including implicit globals) may be hoisted, etc. (Similar to
__attribute__((const))
and__attribute__((pure))
.
This would remove the need for my attempted optimization in #13848, which tried to optimize string writing at the standard library level.
How does this work for nested calls of io.Writer
?
func (t *T) Write(p []byte) (int, error) {
return t.wr.Write(p)
}
You do not know if T.Write
mutates p
or not because it depends on whether t.wr.Write
mutates p
. To be conservative, the compiler will need to assume that it is mutated and then it will emit T.Write·2
, removing any benefit from the proposed optimization.
@dsnet For a case like that you turn T.Write·2
into a copy of the entire method, that passes p
along uninterpreted for possibly copying by t.wr.Write·2
.
@ianlancetaylor, Doesn't that approach fail for this example (that nobody should ever write):
t := &T{wr: ...}
t.Write([]byte("hello"))
func (t *T) Write(p []byte) (int, error) {
n, err := t.wr.Write(p) // Assume t.wr.Write mutates p, but not known at compile time
fmt.Println(p[0]) // Read p, expect to see mutated value
return n, err
}
So T.Write·2
is a copy of T.Write
but the call to t.wr.Write
is replaced by doWrite(t.wr, p)
. If t.wr.Write
does mutate p
, then we are guaranteed that the "hello" string is not accidentally mutated (correct). However, the fmt.Println(p[0])
will not be able to observe the mutation that t.wr.Write
caused because of the copy (wrong).
I guess that this is solvable if doWrite
was changed such that the new copy was known to the caller. Something like func doWrite(w io.Writer, p *[]byte)
You're right: in general, if the method does something with the buffer after passing it to Write
, it may not be possible to apply this optimization to callers of that method.
Related to #2205 (cmd/compile: read-only escape analysis and avoiding string <-> []byte copies).
I create a thread on #golang-dev to discuss some possible optimizations for this.