cmd/compile: optimize Write([]byte(stringVal)) to not copy the string

Question

cmd/compile: optimize Write([]byte(stringVal)) to not copy the string

ianlancetaylor opened this issue 8 years ago · 11 comments

It would be nice to optimize Write([]byte(stringVal)) to not copy the string value. This is normally safe because most Write methods do not modify the byte slice passed in. In fact, the documentation for io.Writer requires that the Write method not modify the byte slice, although this is not (and can not be) enforced.

Here is how we can do this.

For each function and method that takes a parameter of slice type, record whether the slice's underlying array is modified. This will require generating annotations similar to the ones we generate for escape analysis. Assembly functions will require an annotation similar to //go:noescape. (Naturally, a slice that escapes must be treated as modified.)

For each call of the form F([]byte(stringVal)), where we know that F does not modify the slice, we can pass the string value directly. This would do essentially the same thing as the existing optimization for map lookups in which a []byte is converted to a string. This fixes direct calls, but of course the interesting cases all involve calls to methods of values of type io.Writer.

For any type with a Write method that does modify the slice, generate at compile time an additional Write·2 method that makes a copy of the slice and then calls the Write method. The method is named Write·2 to ensure that it does not conflict with any user written method.

When converting any type to io.Writer (or any interface type that inherits from io.Writer), check for the presence of a Write·2 method. If that method exists, add it to the interface as an additional entry in the itab.

For any call to an interface method Write([]byte(stringVal)), modify the call to call a special function doWrite in the io package, without copying the string. doWrite will check for a Write·2 method, and call it if it exists; that will cause the string to be copied as happens today. If the Write·2 method does not exist, which will be the normal case, doWrite will call the Write method as usual, knowing that the Write method does not modify the slice.

Generalizing this to other methods is left as an exercise for the reader.

Answer 1 · 2017-01-27T20:51:02.000Z

I feel like there was a dup bug with the same plan but rejected because as I recall: nobody should be working with such large strings and stay in bytes if they want performance.

Was this motivated by something concrete?

Answer 2 · 2017-01-27T21:20:58.000Z

There was no specific motivation; it was just an idea I had.

Answer 3 · 2017-01-27T21:26:15.000Z

Your compiler annotation for Write sounds a lot like the compiler secretly interpreting Write([]byte) to be Write(immutable []byte); the same could be done for other pointer arguments that are provably unmodified.

Such an (internal) annotation would have additional benefits:

Methods that do not modify their receiver won't require callers to re-load fields (I don't know how common this is or how much it would matter, but it's something)
Functions/methods with only immutable arguments (including implicit globals) may be hoisted, etc. (Similar to __attribute__((const)) and __attribute__((pure)).

Answer 4 · 2017-01-27T21:30:40.000Z

This would remove the need for my attempted optimization in #13848, which tried to optimize string writing at the standard library level.

Answer 5 · 2017-01-27T21:36:02.000Z

How does this work for nested calls of io.Writer?

func (t *T) Write(p []byte) (int, error) {
	return t.wr.Write(p)
}

You do not know if T.Write mutates p or not because it depends on whether t.wr.Write mutates p. To be conservative, the compiler will need to assume that it is mutated and then it will emit T.Write·2, removing any benefit from the proposed optimization.

Answer 6 · 2017-01-27T21:45:46.000Z

@dsnet For a case like that you turn T.Write·2 into a copy of the entire method, that passes p along uninterpreted for possibly copying by t.wr.Write·2.

Answer 7 · 2017-01-27T21:58:49.000Z

@ianlancetaylor, Doesn't that approach fail for this example (that nobody should ever write):

	t := &T{wr: ...}
	t.Write([]byte("hello"))

func (t *T) Write(p []byte) (int, error) {
	n, err := t.wr.Write(p) // Assume t.wr.Write mutates p, but not known at compile time
	fmt.Println(p[0]) // Read p, expect to see mutated value
	return n, err
}

So T.Write·2 is a copy of T.Write but the call to t.wr.Write is replaced by doWrite(t.wr, p). If t.wr.Write does mutate p, then we are guaranteed that the "hello" string is not accidentally mutated (correct). However, the fmt.Println(p[0]) will not be able to observe the mutation that t.wr.Write caused because of the copy (wrong).

I guess that this is solvable if doWrite was changed such that the new copy was known to the caller. Something like func doWrite(w io.Writer, p *[]byte)

Answer 8 · 2017-01-27T22:17:35.000Z

You're right: in general, if the method does something with the buffer after passing it to Write, it may not be possible to apply this optimization to callers of that method.

Answer 9 · 2017-01-28T10:30:10.000Z

Avoiding the allocation is exactly the reason we have a separate WriteString interface. I think we should figure out the question of immutable types in general before adding this kind of workarounds. (I read that it's on Russ' plan for 2017 to research more about immutability in Go.) BTW, we probably don't need to make the compiler work so hard for this, one intermediate step that is probably good enough would be to allow trivial and allocate free WriteString implemented as: func (t *T) WriteString(s string) (int, error) { return t.Write([]byte(s)) } when the compiler could statically prove that the Write method won't write to the string.

Answer 10 · 2017-02-01T08:37:46.000Z

Related to #2205 (cmd/compile: read-only escape analysis and avoiding string <-> []byte copies).

Answer 11 · 2020-01-16T15:20:05.000Z

I create a thread on #golang-dev to discuss some possible optimizations for this.