golang/go

runtime: provide Pinner API for object pinning

ansiwen opened this issue ยท 117 comments

Update, 2021-10-20: the latest proposal is the API in #46787 (comment).


Problem Statement

The pointer passing rules state:

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers.

and

Go code may not store a Go pointer in C memory.

There are C APIs, most notably the iovec based ones for vectored I/O which expect an array of structs that describe buffers to read to or write from. The naive approach would be to allocate both the array and the buffers with C.malloc() and then either work on the C buffers directly or copy the content to Go buffers. In the case of Go bindings for a C API, which is assumably the most common use case for Cgo, the users of the bindings shouldn't have to deal with C types, which means that all data has to be copied into Go allocated buffers. This of course impairs the performance, especially for larger buffers. Therefore it would be desirable to have a safe possibility to let the C API write directly into the Go buffers. This, however, is not possible because

  • either the buffer array is allocated in C memory, but then the pointers of the Go buffers can't be stored in it. (Storing Go pointers in C memory is forbidden.)
  • or the buffer array is allocated in Go memory and the Go buffer pointers are stored in it. But then the pointer to that buffer array can't be passed to a C function. (Passing a Go pointer that points to memory containing other Go pointers to a C function is forbidden.)

Obviously, what is missing is a safe way to pin an arbitrary number of Go pointers in order to store them in C memory or in passed-to-C Go memory for the duration of a C call.

Workarounds

Break the rules and store the Go pointer in C memory

(click)

with something like

IovecCPtr.iov_base = unsafe.Pointer(myGoPtr)

but GODEBUG=cgocheck=2 would catch that.

However, you can circumvent cgocheck=2 with this casting trick:

*(*uintptr)(unsafe.Pointer(&IovecCPtr.iov_base)) = uintptr(myGoPtr)

This might work, as long as the GC is not moving the pointers, which might be a fact as of now, but is not guaranteed.

Break the rules and hide the Go pointer in Go memory

(click)

with something like

type iovecT struct {
  iov_base uintptr
  iov_len  C.size_t
}
iovec := make([]iovecT, numberOfBuffers)
for i := range iovec {
  bufferPtr := unsafe.Pointer(&bufferArray[i][0])
  iovec[i].iov_base = uintptr(bufferPtr)
  iovec[i].iov_len = C.size_t(len(bufferArray[i]))
}
n := C.my_iovec_read((*C.struct_iovec)(unsafe.Pointer(&iovec[0])), C.int(numberOfBuffers))

Again: This might work, as long as the GC is not moving the pointers. GODEBUG=cgocheck=2 wouldn't complain about this.

Break the rules and temporarily disable cgocheck

(click)

If hiding the Go pointer as a uintptr like in the last workaround is not possible, passing Go memory that contains Go pointers usually bails out because of the default cgocheck=1 setting. It is possible to disable temporarily cgocheck during a C call, which can especially useful, when the pointer have been "pinned" with one of the later workarounds. For example the _cgoCheckPtr() function, that is used in the generated Cgo code, can be shadowed in the local scope, which disables the check for the following C calls in the scope:

func ... {
  _cgoCheckPointer := func(interface{}, interface{}) {}
  C.my_c_function(x, y)
}

Maybe slightly more robust, is to export the runtime.dbgvars list:

type dbgVar struct {
	name  string
	value *int32
}

//go:linkname dbgvars runtime.dbgvars
var dbgvars []dbgVar

var cgocheck = func() *int32 {
	for i := range dbgvars {
		if dbgvars[i].name == "cgocheck" {
			return dbgvars[i].value
		}
	}
	panic("Couln't find cgocheck debug variable")
}()

func ... {
	before := *cgocheck
	*cgocheck = 0
	C.my_c_function(x, y)
	*cgocheck = before
}

Use a C function to store the Go pointer in C memory

(click)

The rules allow that a C function stores a Go pointer in C memory for the duration of the call. So, for each Go pointer a C function can be called in a Go routine, that stores the Go pointer in C memory and then calls a Go function callback that waits for a release signal. After the release signal is received, the Go callback returns to the C function, the C function clears the C memory from the Go pointer, and returns as well, finishing the Go routine.

This approach fully complies with the rules, but is quite expensive, because each Go routine that calls a C function creates a new thread, that means one thread per stored Go pointer.

Use the //go:uintptrescapes compiler directive

(click)

//go:uintptrescapes is a compiler directive that

specifies that the function's uintptr arguments may be pointer values that have been converted to uintptr and must be treated as such by the garbage collector.

So, similar to the workaround before, a Go function with this directive can be called in a Go routine, which simply waits for a release signal. When the signal is received, the function returns and sets the pointer free.

This seems already almost like a proper solution, so that I implemented a package with this approach, that allows to Pin() a Go pointer and Poke() it into C memory: PtrGuard

But there are still caveats. The compiler and the runtime (cgocheck=2) don't seem to know about which pointers are protected by the directive, because they still don't allow to pass Go memory containing these Go pointers to a C function, or to store the pointers in C memory. Therefore the two first workarounds are additionally necessary. Also there is the small overhead for the Go routine and the release signalling.

Proposal

It would make Cgo a lot more usable for C APIs with more complex pointer handling like iovec, if there would be a programmatic way to provide what //go:uintptrescapes provides already through the backdoor. There should be a possibility to pin an arbitrary amount of Go pointers in the current scope, so that they are allowed to be stored in C memory or be contained in Go memory that is passed to a C function within this scope, for example with a runtime.PtrEscapes() function. It's cumbersome, that it's required to abuse Go routines, channels and casting tricks in order provide bindings to such C APIs. As long as the Go GC is not moving pointers, it could be a trivial implementation, but it would encapsulate this knowledge and would give users a guarantee.

I know from the other issues and discussions around this topic that it's seen as dangerous if it is possible to pin an arbitrary amount of pointers. But

  1. it is possible to call an arbitrary amount of C or //go:uintptrescapes functions, therefore it is also possible to pin arbitrary amount of Go pointers already.
  2. it is necessary for some C APIs

Related issues: #32115, #40431

/cc @ianlancetaylor @rsc @seebs

edit: the first workaround had an incorrect statement.
edit 2: add workarounds for disabling cgocheck

From what I can tell from the documentation for the new cgo.Handle, it's intended only for a situation where a pointer needs to be passed from Go to C and then back to Go without the C code doing anything with what it points to. As it passes a handle ID, not a real pointer, the C code can't actually get access to the actual data. Maybe a function could be provided on the C side that takes a handle ID and returns the original pointer, thus allowing the C code to access the data? Would that solve this issue?

Edit: Wait, that doesn't make sense. Could you just use Handle to make sure that it's held onto? Could the definition of Handle be extended to mean that the pointer itself is valid for the duration of the Handle's existence? In other words, this would be defined to be valid:

// void doSomethingWithAPointer(int *a);
import "C"

func main() {
  v := C.int(3)
  h := cgo.NewHandle(&v)
  doSomethingWithAPointer(&v) // Safe because the handle exists for that pointer.
  h.Delete()
}

Alternatively, if that's not feasible, what about a method on Handle that returns a valid pointer for the given value?

// Pointer returns a C pointer that points to the underlying value of the handle
// and is valid for the life of the handle.
func (h Handle) Pointer() C.uintptr_t

Disclaimer: I'm not familiar enough with the internals of either the Go garbage collector or Cgo to know if either of these even make sense.

@DeedleFake As you pointed out yourself, the cgo.Handle has a very different purpose. It's just a registry for a map from a C compatible arbitrary ID (uintptr) to an arbitrary Go value. It's purpose is to refer to a Go value in the C world, not to access it from there. It doesn't affect the behavior of the garbage collector, which could still freely move around the values in the Handle map, and would never delete them, since they are referenced by the map.

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns. As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

@ianlancetaylor thanks for taking the time to answer.

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns.

I agree, that is an advantage. However, with go routines it's trivial to fire-and-forget thousands of such function calls, that never return.

As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

I didn't describe a specific API, that's true. I hoped that this could be developed here together once we agreed on the requirements. One of the requirements that I mentioned was, that the pinning happens only for the current scope. That implies automatic unpinning when the scope is left. Sorry that I didn't make that clear enough. So, to rephrase more compactly, the requirements would be:

  • possibility to pin pointers in the current scope (exactly as if they would be the argument of a C function call)
  • automatic unpinning when the current scope is left (the current function returns)
  • cgocheck knows about the pinning and does not complain

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

As stated above, I didn't want to suggest a specific API, but characteristics of it. In the end it could be a function like runtime.PtrEscapes(unsafe.Pointer). The usage could look like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
  numberOfBuffers := len(bufferArray)

  iovec := make([]C.struct_iovec, numberOfBuffers)

  for i := range iovec {
    bufferPtr := unsafe.Pointer(&bufferArray[i][0])
    runtime.PtrEscapes(bufferPtr) // <- pins the pointer and makes it known to escape to C
    iovec[i].iov_base = bufferPtr
    iovec[i].iov_len = C.size_t(len(bufferArray[i]))
  }

  n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
  // ^^^ cgocheck doesn't complain, because Go pointers in iovec are pinned
  return int(n) // <- all pinned pointers in iovec are unpinned
}

As long as the GC is not moving, runtime.PtrEscapes() is almost a no-op, it would basically only tell cgocheck not to bail out for these pointers. But users would have a guarantee, that if the GC becomes moving later, this function will take care of it.

Regarding footguns I'm pretty sure, that the workarounds, that have to be used at the moment to solve these problems, will cause more "programs to silently behave badly" than the potential abuse of a proper pinning method.

it would be very desirable to somehow ensure that the pointers are unpinned

Drawing from runtime.KeepAlive, one possibility might be something like:

package runtime

// Pin prevents the object to which p points from being relocated until
// the returned PointerPin either is unpinned or becomes unreachable.
func Pin[T any](p *T) PointerPin

type PointerPin struct {โ€ฆ}
func (p PointerPin) Unpin() {}

Then the example might look like:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
	numberOfBuffers := len(bufferArray)

	iovec := make([]C.struct_iovec, numberOfBuffers)

	for i := range iovec {
		bufferPtr := unsafe.Pointer(&bufferArray[i][0])
		defer runtime.Pin(bufferPtr).Unpin()
		iovec[i].iov_base = bufferPtr
		iovec[i].iov_len = C.size_t(len(bufferArray[i]))
	}

	n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
	return int(n)
}

A vet warning could verify that the result of runtime.Pin is used, to ensure that it is not accidentally released too early (see also #20803).

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray.
I'm trying to double check what the behavior would be regarding if we needed to make multiple calls into C using the same pointer.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray.

@phlogistonjohn Yes, exactly.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

Yes, I also would prefer @bcmills version from a user's perspective, because it's more explicit and it's basically the same API that we use with PtrGuard.

I just don't know enough about the implications on the implementation side and effects on the Go internals, so I don't know what API would be more feasible. My proposal is about providing an official way to solve the described problem. I really don't care so much about the "form", that is how exactly the API looks like. Whatever works best with the current Go and Cgo implementation. ๐Ÿ˜Š

@bcmills I guess, an argument @ianlancetaylor might bring up against your API proposal is, that it would allow to store the PointerPin value in a variable and keep them pinned for an unlimited time, so it would not "ensure that the pointers are unpinned". If the unpinning is implicit, it is more comparable to //go:uintptrescapes.

@ianlancetaylor

it would be very desirable to somehow ensure that the pointers are unpinned.

So, if you want to enforce the unpinning, the only strict RAII pattern in Go that I could come up with is using a scoped constructor like this API:

package runtime

// Pinner is the context for pinning pointers with Pin()
// can't be copied or constructed outside a Pinner scope
type Pinner struct {โ€ฆ}

// Pin prevents the object to which p points from being relocated until
// Pinner becomes invalid.
func (Pinner) Pin(p unsafe.Pointer) {...}

func WithPinner(func(Pinner)) {...}

which would be used like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
    numberOfBuffers := len(bufferArray)
    
    iovec := make([]C.struct_iovec, numberOfBuffers)

    var n C.ssize_t
    runtime.WithPinner(func (pinner runtime.Pinner) {
        for i := range iovec {
            bufferPtr := unsafe.Pointer(&bufferArray[i][0])
            pinner.Pin(bufferPtr)
            iovec[i].iov_base = bufferPtr
            iovec[i].iov_len = C.size_t(len(bufferArray[i]))
        }
        
        n = C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
    }) // <- All pinned pointers are released here and pinner is invalidated (in case it's copied out of scope).
    return int(n)
}

I personally would prefer a thinner API, where either it must be explicitly unpinned, like in the proposal of @bcmills, or - even better - the pinning implicitly creates a defer for the scope in which the pinning function has been called from. Given, that this will be implemented in the runtime package, I guess there are tricks and magic that can be used there.

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

@Merovius

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

The "keeping-around" can easily be prevented by one pointer indirection that get's invalidated when the scope is left. You can have a look at my implementation of PtrGuard that even has test case for exactly the case of a scope escaping variable.

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

Yeah, I agree, as I wrote before, I'm totally fine with both. It's just something I came up with to address @ianlancetaylor's concerns. I also think that the risks are "manageable", there are all kinds of other risks when dealing with runtime and/or unsafe packages after all.

I think that the API proposed by @bcmills is the most useful one. Although there is a risk of forgetting to unpin a pointer, once Go gets a moving garby collector, for certain low level uses, certain blocks of memory will have to stay pinned for the duration of the program. Certainly for system calls in Linux, such as for the frame buffers. In other words, Pin and Unpin are also useful without cgo.

hnes commented

Hi @rsc, any updates on this issue recently? I noticed it has been several days after the 2021-08-04's review meeting minutes.

rsc commented

The compiler/runtime team has been talking a bit about this but don't have any clear suggestions yet.

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

/cc @aclements

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

@rsc But my point in the description was, that we have pinning already when C functions are called with Go pointers or when the //go:uintptrescapes directive is used. So the situation is complex already, isn't it?

@rsc I would say the converse is also true. If you are going to implement a moving garbage collector without support for pinning, that will make it much more complex to use Go for certain direct operating calls without cgo, e.g. on Linux.
In other words, as @ansiwen says, there's really no way to avoid that complexity. And therefore I think it would be better if Go supported it explicitly than through workarounds.

Unbounded pinning has the potential to be significantly worse than bounded pinning. If people accidentally or intentionally leave many pointers pinned, that can fragment the spaces that the GC uses, and make it very hard for a moving GC to make any progress at all. This can in principle happen with cgo today, but it is unlikely that many programs pass a bunch of pointers to a cgo function that never returns. When programmers control the pinning themselves, bugs are more likely. If the bug is in some imported third party library, the effect will be strange garbage collection behavior for the overall program. This will be hard to understand and hard to diagnose, and it will be hard to find the root cause. (One likely effect will be a set of tools similar to the memory profiler that track pinned pointers.)

It's also worth noting that we don't have a moving garbage collector today, so any problems that pinned pointers may introduce for a moving garbage collector will not be seen today. So if we ever do introduce a moving garbage collector, we will have a flag day of hard-to-diagnose garbage collection problems. This will make it that much harder to ever change the garbage collector in practice.

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

@ianlancetaylor That is fair enough. But then it seems to me the best way ahead is to put this issue on hold until we can implement a prototype moving garbage collector.

There is always a workaround if there is no pinning available and that is to manually allocate memory directly from the OS so the GC doesn't know about it. It is not ideal but it can work.

Yeah, one workaround that is missing from the discussion is hiding the C api allocation concerns, e.g. iovec could be implemented like:

package iovec

type Buffers struct {
	Data [][]byte

	data *C.uint8_t
	list *C.iovecT
}

func NewBuffers(sizes []int) *Buffers {
	...
	// C.malloc everything
	// cast from *C.uint8_t to []byte
}

func (buffers *Buffers) ReadFrom(f *os.File) error { ...

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

@ianlancetaylor

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

Let's separate the two questions "pinning yes/no" and "pinning bounded/unbounded".

pinning yes/no

I also proposed

  1. an API that allows bounded pinning (runtime.WithPinner()).
  2. the potential possibility of a runtime.Pin() with no return value and an implicit defer that automatically gets unpinned when the current function returns.

Both provide a similar behaviour as the //go:uintptrescapes directive, if that is what you mean with "bounded". What do you think of these options?

pinning bounded/unbounded

  1. when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?
  2. when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?
  3. would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

For me personally the first question is more important. Bounded or unbounded, I think the existing and required ways of pinning should be made less hacky in their usage.

@egonelbre

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

The bufferArray [][]byte is just a placeholder for an arbitrary "native Go data structure". As the problem statement mentions, the goal is to avoid copying of the data. Especially vectored I/O is used for big amounts of data, so depending on the use case, you can't choose the target data structure by yourself, but it is provided by another library that you intend to use (let's say video processing for example). That would mean, that in all these cases you have to copy the data from your own C allocated data structure to the Go-allocated target data structure of your library, for no good reason.

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

Same answer.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

hnes commented

How would we deal with the iovec struct during vectored I/O syscall if we have a GC that is based on moving pointers? Maybe the same solution could also be applied to the pointer pinning we are discussing?

A GC based on moving pointers may be completely blocked by a pinned pointer.

I'm afraid that would badly impact the GC latency or something else if it is true. Please consider the disk i/o syscall that may block a very long time.

@ianlancetaylor

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

I don't think "add more" is the right wording. It's more about exposing the pinning in a better way. And these are not arguments for doing it, but arguments against the supposed risks of doing it.

The argument for doing it should be clear by now: give people a zero-copy way to use APIs like iovec with Go data structures in a future proof way. At the moment, that's not possible.

In your answers you skipped the first part about the bounded pinning. If you have the time to comment on these too, I would be very interested. ๐Ÿ˜Š

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

The current system for pinning pointers doesn't permit pointers to be pinned indefinitely, if we discount the unusual case of a C function that does not return.

I agree that other systems that somehow ensure that pointers can't be pinned indefinitely are better. (I don't think that an implicit defer is a good approach for Go, though.)

Here's another minimalistic API proposal for bounded pinning (basically a programmatic version of the uintptrescapes directive):

package runtime

// PtrEscapes prevents the allocated objects referenced by ptrs from being relocated until
// function f returns.
func PtrEscapes(ptrs []unsafe.Pointer, f func())

Example:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
	var buffers []unsafe.Pointer
	numberOfBuffers := len(bufferArray)

	iovec := make([]C.struct_iovec, numberOfBuffers)

	for i := range iovec {
		bufferPtr := unsafe.Pointer(&bufferArray[i][0])
		buffers = append(buffers, bufferPtr)
		iovec[i].iov_base = bufferPtr
		iovec[i].iov_len = C.size_t(len(bufferArray[i]))
	}

	var n C.size_t
	runtime.PtrEscapes(buffers, func() {
		n = C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
	})
	return int(n)
}
rsc commented

I think the main question we need to answer is whether the runtime/GC team wants to commit to any pinning API at all.
/cc @aclements

@ianlancetaylor

The current system for pinning pointers doesn't permit pointers to be pinned indefinitely, if we discount the unusual case of a C function that does not return.

Is there a reason to believe that we can discount that case today?

I would not be at all surprised to see Go programs that, say, transfer control to a C main-like function that then makes callbacks back into Go for certain parts of the program. If I recall correctly, some C GUI toolkits actually require the program to be structured in a very similar way.

@hnes raised a concrete example in #46787 (comment): We are primarily discussing iovecs, and a readv/writev/etc could very likely be on a blocking FD and block indefinitely.

Here's a mini-proposal for a pinning mechanism that avoids some of the downsides mentioned above, particularly the problem of forgetting to unpin any pinned things.

Arguments of cgo calls are pinned for the duration of the call as they are now. In addition, this proposal lets you mark an object as "pinning flow-through" (terrible name, please suggest better ones). If a "pinning flow-through" object is pinned, then any objects it references are also pinned (for the duration of the outer pin). This way, you can't introduce any pinning roots, you can only make the "scope" of pinning during a cgo call larger. All pins when the cgo call returns are effectively dropped.

You would use this, for example, to mark the [][]byte object that you pass to writev as pinning flow-through. When that [][]byte is passed to writev, every []byte referenced is also pinned, for the duration of writev.

The runtime would keep track of this mark for the lifetime of the object, similar to how we keep track of finalizers currently. You would only need to mark an object once - you could use it many times for many cgo calls.

There would be no way to unmark a pinning flow-through object. (Although we could add such a thing, if people thought it was needed.) The critical feature that makes this proposal better than a raw pinning API is that it doesn't matter if we have lots of objects scattered around the heap marked as pinning flow-through. Only if they are referenced by a root pinning operation (aka used as an argument to a cgo call) do those marks mean anything.

pinning flow-through objects don't pin recursively. Only objects directly referenced from the pinning flow-through object are pinned. If you want deeper pinning, you'd have to mark everything but the leaves of the tree you want pinned.

Possibly having an actual runtime.SetPinningFlowThrough(object interface{}) API would be overkill, and it could be enough to have a special //go: annotation on system calls that would mark arguments as pinning flow-through for the duration of the call. Not sure if that would be enough, or if it would be easier than having an explicit runtime call.

Does that work in the io_uring model?

In that model, you write the [][]byte to a ring buffer shared with the kernel (effectively). And then you "submit" one or more entries in the ring buffer via a syscall; that syscall will immediately return (or more precisely, when that syscall returns, it doesn't necessarily mean the IOs that were just submitted have completed).

A later submission syscall could indicate to the caller that previously submitted IOs have completed. (It does this not via the mere fact that the syscall returned, but via a separate completion ring buffer shared between the application the kernel that the application has to process)

It feels like the pin might expire too early in your model (i.e. with the next call to io_uring_submit, rather than when the application pulls a matching entry from the completion queue ring buffer).
And for that matter, the pin might start too late, because it needs to start before we start writing into the shared ring buffer, not when we invoke the io_uring_submit syscall.

(I suppose one option would be for the standard library to offer a higher-level io_uring API, e.g. a blocking SubmitBatch([]Submissions) that queues the submissions on an io_uring, waits until those specific completions have been received and then returns from SubmitBatch. The compiler would then use SubmitBatch as the scope of the pin, rather than a cgo call)

Does that work in the io_uring model?

No, I don't think it does. Pointers that must be pinned while no cgo call is currently active would not be supported.
(You'd have to allocate such things with C.malloc.)

I think that io_uring is actually a really interesting example of a bigger problem. For example, we already have a very similar problem in internal/poll on Windows: it uses I/O completion ports, which do pass Go pointers into the kernel across asynchronous system calls (much like io_uring). Granted, that's "internal" so technically we could do whatever we needed to make that work, but it shows that this is not just a theoretical problem with an API Go doesn't support yet. Another example is that it's common in graphics code to share long-lived graphics buffers with the kernel and the hardware, which would also require long-lived pinned memory. I'm not sure whether this is a problem in Go's current OpenGL packages, but it wouldn't surprise me.

Another example is that it's common in graphics code to share long-lived graphics buffers with the kernel and the hardware, which would also require long-lived pinned memory. I'm not sure whether this is a problem in Go's current OpenGL packages, but it wouldn't surprise me.

In my experience, pinned Go memory wouldn't help for GPU API. It's true that legacy OpenGL has API (glVertexAttribPointer, perhaps others) that retains user-provided pointers, but modern OpenGL and every other API (Direct3D, Metal, Vulkan) all operate on API-allocated buffer objects that you either map into your address space or copy into synchronously. All because GPUs can't in general access system memory as efficiently (or at all).

Here's another thought, inspired by something @cherrymui said: what if we provided explicit pinning/unpinning operations, but pinned memory also stayed live. In a lot of cases you want that anyway, and it would create an incentive for users to unpin memory even with a non-moving collector.

Perhaps the hazard here is that if users pinned memory at a relatively slow rate and didn't unpin it, this would simply create a memory leak. But at some point these are all power tools we have to trust users to use correctly anyway.

We could probably limit the number of pinned pointers, maybe something like N+M*(number of ongoing cgo calls), and crash the program if it exceeds the limit (maybe allow user to bump up the limit).

what if we provided explicit pinning/unpinning operations, but pinned memory also stayed live.

@aclements I always presumed, pinning implies keeping alive. That's also how uintptrescapes works.

I always presumed, pinning implies keeping alive.

I think it's important to consider that aspect separately. For uintptrescapes, the pointer is kept live by virtue of being in a live argument on the stack. Something is actively using that pointer (to the best of our knowledge), so it really only makes sense to keep it live.

For @randall77's "pin-through" proposal, I think the same argument would apply to keeping it live while it's in use by a cgo call, but I don't think it would make sense for the act of marking something pin-through to keep it live. (Maybe I wrong, though; I haven't thought very hard about that interaction.)

For explicit pin/unpin operations, it's much less clear to me. Certainly it would be kept live during the cgo call. But I bet it often makes sense to allocate something, pin it, pass it to cgo, and then just drop it on the floor and let the GC take care of it without worrying about unpinning it. There are other mechanisms extend its life time if that's necessary (e.g., runtime.KeepAlive).

I think pinning should imply keeping alive.

In some very deep sense, GC simulates having infinite memory. From that perspective, collecting an unreachable object is the same as relocating it to an unnameable location.

Pinning an object prevents it from being relocated at all, which should also prevent it from being relocated to the bit-bucket.

FWIW, that's why I think the pin itself makes sense as an PointerPin object with its own lifetime, kept alive by a pending call to Unpin. An object can be relocated to the bit-bucket only when the pins attached to it can also be relocated there.

If we wanted to make it even more obvious when users have forgotten to unpin their pinned memory, we could throw an error if a PointerPin object becomes unreachable without first being unpinned.

I always presumed, pinning implies keeping alive.

I think it's important to consider that aspect separately. For uintptrescapes, the pointer is kept live by virtue of being in a live argument on the stack. Something is actively using that pointer (to the best of our knowledge), so it really only makes sense to keep it live.

In the case of uintptrescapes there is only a uintptr on the stack, but the GC doesn't collect the object, until the function returns, although it's unreachable. No KeepAlive() necessary. So I guess there is more involved, but I haven't checked the code.

For explicit pin/unpin operations, it's much less clear to me. Certainly it would be kept live during the cgo call. But I bet it often makes sense to allocate something, pin it, pass it to cgo, and then just drop it on the floor and let the GC take care of it without worrying about unpinning it. There are other mechanisms extend its life time if that's necessary (e.g., runtime.KeepAlive).

Ok, now I got it. So you mean, a pinning wouldn't block the GC to collect the object in case it becomes unreachable. Yeah, that can makes sense. But at least having something that I can use to Unpin() would imply that it is kept alive, wouldn't it. I think this question would only play a role, if we have something like Unpin(unsafe.Pointer) that can be used on pointers to objects that were unreachable for some time.

FWIW, that's why I think the pin itself makes sense as an PointerPin object with its own lifetime, kept alive by a pending call to Unpin. An object can be relocated to the bit-bucket only when the pins attached to it can also be relocated there.

If we wanted to make it even more obvious when users have forgotten to unpin their pinned memory, we could throw an error if a PointerPin object becomes unreachable without first being unpinned.

I like this idea of a PointerPin object to pass to Unpin quite a bit. If nothing else, I think having this new object makes it a bit easier to remember you need to call Unpin.

That said, though I suppose it is a matter of perspective, I view such an API as not keeping pinned memory alive. The pinned object is only kept alive if the PointerPin is kept alive, which IMO is the same thing as requiring users to keep the pinned object alive, just with one extra level of indirection.

rsc commented

OK, so it sounds like maybe people are happy with something like

package runtime

type Pinned struct { ... }
func Pin(object interface{}) *Pinned
func (p *Pinned) Unpin()

and either Pin causes an object to stay live, or it is a crash if the garbage collector collects a pinned object (meaning an Unpin was forgotten). It seems like the former is much more helpful since you can debug it with heap profiles, etc.

Do I have that right?

I'm fine with both options.

For completeness I want to mention another use case that I just encountered and is not covered by the problem statement above: there are also asynchronous read and write APIs that by definition access the provided buffer after the C function returns. This is a good argument for having an explicit unpin functionality, although you could workaround an implicit scope-based unpinning with a go routine, which keeps a pinning scope alive as long as required.

I apologize if this is off-topic. Given bufferArray [][]byte it's not a problem to call C.func(&bufferArray[i][0], &bufferArray[j][0], &bufferArray[k][0]). Which effectively means that Go would have to commit to not moving corresponding buffers for duration of C.func call. In other words there is an implied pinning mechanism at work here. And I fail to imagine why storing these pointers in a C.struct would void it. What would be problematic is when C.func modified the pointer[s]. But then no amount of pinning would be meaningful. Indeed, if you don't make the assumption that pointer are not modified, how would pinning by itself qualify the call as safe? To summarize, it's not self-obvious that "Go memory to which it points does not contain any Go pointers" is actually about pinning. It's rather about mutability. This is not to say that explicit pinning mechanism would not become handy, but it would probably be in demand in asynchronous scenarios, as already suggested above.

As for mutability. C provides a way to formulate an immutability contract with const qualifier. And Go could use it to allow calls with Go pointer to Go pointer. (I for one would even argue that it should:-) Note that the referred C.readv does declare iov as a pointer to constant struct, which means that implementation is obliged to commit to not changing any pointers in the corresponding C.struct. And with this in mind, how would the suggested C.readv call be fundamentally different from C.func(&bufferArray[i][0],...)?

@rsc, I'd been imagining for ergonomic reasons that Pinned could pin multiple objects and (*Pinned).Unpin would unpin all of them. Also, people are likely to defer p.Unpin() and it would be much more efficient to enable a single such defer than to encourage multiple defers to unpin multiple objects, since the latter will often disable most defer optimizations.

@dot-asm, you're right that cgo already has pinning behavior. There's been some discussion of this above. It's spread across various comments, but this one is probably the most relevant.

"Go memory to which it points does not contain any Go pointers" is about both pinning and mutability. By surfacing the Go pointers clearly as cgo call arguments, the runtime has a clear place to hook automatic pinning (and unpinning). If we allow passing pointers to pointers, then the runtime may have to recursively traverse these data structures to pin all of the pointers they contain.

If we allow passing pointers to pointers, then the runtime may have to recursively traverse these data structures to pin all of the pointers they contain.

Here is the concern. There is unsafe interface and people shall use it each time they have a problem to solve. This is obviously suboptimal and arguably straight off unsustainable. And what you say above is that the recursion is unsustainable too. But this kind of asks for compromise, i.e. can we discuss and agree on which is less unsustainable? ;-) Or maybe you can compromise and support just one level of indirection? And specifically in a slice (as opposed to lists or something)?

Here I want to again apologize for a possible side track. Feel free to ignore, since it might be just my struggle:-) Anyway, I'd like to suggest to consider following a.go snippet

package foo

type iovec struct {
   base *byte
   len int
}

func bar(iov []iovec) {
   for i := range iov {
       *iov[i].base += 1
   }
}

and examine output from go tool compile -S a.go. We'll see that the inner loop looks as following:

        0x0009 00009 (a.go:10)  MOVQ    CX, DX
        0x000c 00012 (a.go:10)  SHLQ    $4, CX
        0x0010 00016 (a.go:10)  MOVQ    (AX)(CX*1), SI
        0x0014 00020 (a.go:10)  MOVBLZX (SI), DI
        0x0017 00023 (a.go:10)  INCL    DI
        0x0019 00025 (a.go:10)  MOVB    DIB, (SI)
        0x001c 00028 (a.go:9)   LEAQ    1(DX), CX
        0x0020 00032 (a.go:9)   CMPQ    BX, CX
        0x0023 00035 (a.go:9)   JGT     9

Essential to note that this is pretty much how the corresponding C subroutine would look like (when given &iov[0] as argument). More specifically as if no buffers are moved during its execution. But this is Go binary code, not C. In other words there are times when buffers appear pinned even to Go code(*). So that if a C call was made instead of the loop, things would just work out naturally (provided that immutability contract is honoured of course). Or is it so that C calls are not as straightforward as one would naively imagine and leave Go caller in a state that allows for the garbage collector to intervene? If so, then yes, explicit pinning would be in demand. Though at the same time one can probably argue that there is sufficient metadata available to arrange implicit one, at least in some specific cases... Or maybe one can arrange an option for application to tell runtime "treat this C call as if it's a tight loop in Go [similar to above]" so that garbage collector is held back? At least I for one would argue that it would be better option than having to resort for unsafe interface...

(*) My understanding is that this is the time prior the write-barrier thing is checked upon. But even after the barrier passed, and garbage collector is executed in parallel, it won't be free to move buffers as long as such loops are executed elsewhere, right? Is it safe to assume that movements would have to be performed during another stop-the-world?

rsc commented

@aclements it sounds like you are advocating for:

package runtime

type Pinner struct { ... }
func (p *Pinner) Pin(object interface{})
func (p *Pinner) Unpin()

which would get used as

var p runtime.Pinner
defer p.Unpin()
for lots of things {
    p.Pin(thing)
}

Is that right?

[Updated 10/20 - changed Pinned to Pinner.]
[Updated 10/25 - changed one last Pinned to Pinner.]

@rsc exactly (maybe it should be Pinner? but whatever)

@dot-asm, I think, at a high level, it's important to recognize that the Go runtime and the compiler are in cahoots here. The generated code can look like that because the compiler knows heap objects won't move and because it generates metadata telling the runtime how to find the pointers being manipulated by that code. The GC could in fact intervene during that code snippet, but the runtime and compiler have a contract that makes that safe (for example, the GC promises not to move the stack in the middle of that snippet, though at other times it can). If the GC moved heap objects, or were generational, etc, the compiler would have to produce different code. (Regarding your footnote, it is possible to have a moving GC that does not stop the world. For example, some of the early work on this was done by the very Rick Hudson who built Go's concurrent garbage collector.)

I'd also like to see some thought given in the API docs to reusing a runtime.Pinned instance multiple times, for GC efficiency.

As an example, I assume when we use this for low-level things (like io_uring) where performance matters, I imagine we'd want to reuse the actual submissions themselves, and presumably the submission struct would include a runtime.Pinned instance to keep stuff alive/pinned while a submission is pending in the kernel. Is that ok?

First of all, thanks! ๐Ÿ‘

... it is possible to have a moving GC that does not stop the world.

But at the very least it would have to preempt the thread that works the objects to be moved(*). Then wouldn't it mean that making a goroutine in cgo call non-preemptable effectively pins its working set? If so, can it be offered as [a] run-time option for developer to opt for?

Anyway, could you straighten up one thing? Is it correct understanding that currently Go GC is not moving? So that the suggestion in question is rather about future possibilities?

With this in mind I wonder if unsafe.Pointer would make any sense. I mean it sounds like pinning would have to supersede unsafe.Pointer. But then what would it mean for backward compatibility? Maybe making unsafe.Pointer automatically pinned behind the curtains would be [a] better path forward?

(*) Well, it might be possible to pull it off with transactional memory, but it's not an universal option, hence we don't consider it.

But at the very least it would have to preempt the thread that works the objects to be moved(*).

That is not the only approach, and while I am not an expert it doesn't strike me as a likely approach. I think a more likely approach is an optional read barrier, just as we already have an optional write barrier. If the read barrier is turned on, then memory loads from the heap would be coordinated with the GC.

Is it correct understanding that currently Go GC is not moving? So that the suggestion in question is rather about future possibilities?

That is correct: the current Go GC does not move objects. What we are discussing here is an API that will not prevent us from doing that in the future.

With this in mind I wonder if unsafe.Pointer would make any sense. I mean it sounds like pinning would have to supersede unsafe.Pointer. But then what would it mean for backward compatibility? Maybe making unsafe.Pointer automatically pinned behind the curtains would be [a] better path forward?

I'm not sure what you mean here. From the GC perspective an unsafe.Pointer is exactly like any other pointer. If we have a moving GC, then whatever we do to make ordinary pointers work will also work for unsafe.Pointer.

If the read barrier is turned on, then memory loads from the heap would be coordinated with the GC.

And the only way to coordinate the sample I suggested would be to claim a mutex in each iteration of the inner loop and block GC on it. I reckon it would be too costly. It would be more efficient to simply give control to GC upon barrier check and assume that when control is regained all the dust is settled. It can count as "preemption" too, a cooperative one.

I'm not sure what you mean here. [with unsafe.Pointer vs. pinning]

My view is probably skewed, but in my mind unsafe.Pointer is exclusively about passing it to outside Go. At least Go itself has no use for it, right? Now, this means that whenever we convert a pointer to unsafe, we actually assume that the object is not garbage-collected nor moved. And I'd argue that the last part of this assumed contract is sufficiently "natural" to commit to implicitly. So why would one need an additional interface to meet a requirement that is assumed to be met already? Or conversely, if there is a pinning interface, what would we need unsafe.Pointer for? Which is why I view it as unsafe.Pointer vs. pinning. With preference for former for better backward compatibility. (Yeah, I know, who am I to judge? Just a cent, feel free to ignore:-)

in my mind unsafe.Pointer is exclusively about passing it to outside Go. At least Go itself has no use for it, right? Now, this means that whenever we convert a pointer to unsafe, we actually assume that the object is not garbage-collected nor moved.

unsafe.Pointer is used in many situations where no Cgo is involved.

unsafe.Pointer is used in many situations where no Cgo is involved.

OK. Can you give an example in which unsafe.Pointer would not be assumed to be pinned?

In the context of the project, in the runtime, unsafe.Pointer conversion shows up >1500 times, an example is in the map iteration type which you would not expect to pin parts. Outside the project, I use unsafe regularly for type punning with no intention that the value be pinned.

Cool! Thanks! While it does address the "exclusively" part in the original "exclusively about passing it to outside Go," I don't feel that it invalidates the point I'm trying to get across. Indeed, let's flip the question and ask if there are occasions when you'd need to pin an object without having to pass its pointer to outside Go? If no, then wouldn't it be better if pinning was implicit, again, for backward compatibility sake. As mentioned earlier by @aclements, compiler and runtime are in cahoots, and it should be possible to figure it out. When pinning would be necessary, foremost in cgo call, and just make the arrangements behind the curtains that is... Just a thought...

Thoughts are going all over the place and it's late hour for me... But as for "behind the curtains" part. Can we at least agree that pinning of objects not containing Go pointers will remain implicit in cgo call? (Yeah, it's kind of "selfish" question, apologies for that:-)

And the only way to coordinate the sample I suggested would be to claim a mutex in each iteration of the inner loop and block GC on it.

That turns out not to be the case. With both a read and a write barrier, there is no need for a mutex that blocks GC. A read barrier is certainly a performance cost. But it's not a mutex and it doesn't prevent parallel execution by the program and the GC.

ask if there are occasions when you'd need to pin an object without having to pass its pointer to outside Go?

I don't know of any. But note that "outside Go" isn't restricted to cgo. For example, using io_uring in Go programs.

wouldn't it be better if pinning was implicit, again, for backward compatibility sake

Certainly pointers passed directly to cgo are going to remain pinned. Otherwise, as you suggest, we would lose backward compatibility. What we are discussing here is pointers passed indirectly to cgo, or pointers that need to be pinned for other reasons.

Certainly pointers passed directly to cgo are going to remain pinned. Otherwise, as you suggest, we would lose backward compatibility.

Fantastic! Keep this thought;-)

What we are discussing here is pointers passed indirectly to cgo, or pointers that need to be pinned for other reasons.

Yes, absolutely! And with this in mind let's ask ourselves how does it work now? You invoke some unsafe magic and it works, works 100% reliably, as long as a) the other side, be it cgo, io_uring, or anything of the kind, doesn't mess up Go pointers; and b) GC is none-moving. Now, the moment b) is not true, no amount of the current unsafe magic will help. Programs will break, and will have to be modified, and in a none backward compatible fashion(*). And here is what I'm trying to get to. Unless the compiler and run-time figure it out (at least for most common cases) and just make corresponding arrangements. Figure out using unsafe.Pointer, or maybe some other backward-compatible idiom, as a hint.

(*) Let's also ask ourselves what would modifications look like given the suggestion? To me it sounds like one would omit all the unsafe.Pointers and pin the stuff. This is why I refer to them as "vs."

You invoke some unsafe magic and it works, works 100% reliably

What is this "magic" you are talking about? As far as I am aware, there is no magic involved in unsafe.Pointer. The "unsafe" only refers to type safety, a Go pointer stored in an unsafe.Pointer is still 100% safe regarding memory management. It's like a void shared pointer in C++, unsafe regarding type, safe regarding memory.

Now, the moment b) is not true, no amount of the current unsafe magic will help. Programs will break, and will have to be modified, and in a none backward compatible fashion(*).

I'm quite sure this is not accurate. The moment a moving GC would be introduced, all the "legal" places, where Go pointers are passed to non-Go code (that is for example as arguments of Cgo functions or functions with the go:uintptrescapes compiler directive) would also add implicit pinning, which is a completely Go internal change. All Go programs that are following the pointer passing rules would continue to work without any change.

What we are discussing here is adding a public API for explicit pinning, because the pointer passing rules are too restrictive for some C APIs that users would want to use: iovec, async APIs, etc.

What is this "magic" you are talking about?

The one you listed in the beginning of this thread. Admittedly, "magic" might be a too strong word, it's not as "magic" as I make it sound. Nevertheless, the point is that there are ways around the current limitations, and they (at least some) involve unsafe.Pointer transmutations. And they do work (due to b) on the Go side). And they will stop. And I'm wondering if they actually have to.

What is this "magic" you are talking about?

The one you listed in the beginning of this thread. Admittedly, "magic" might be a too strong word, it's not as "magic" as I make it sound. Nevertheless, the point is that there are ways around the current limitations, and they (at least some) involve unsafe.Pointer transmutations. And they do work (due to b) on the Go side). And they will stop. And I'm wondering if they actually have to.

Oh, now I see the misunderstanding. So, you are talking about the first three workarounds that start with "Break the rules..."? Of course they stop working when the GC becomes moving, because... they are breaking the rules. But the use of unsafe.Pointer() in these workarounds is purely for type punning, there is no other effect to it. You can perfectly break the rules also without using unsafe.Pointer, like in this example:

package main

/*
#include <stdio.h>

typedef struct {
        int *i;
} T;

T c_data;

inline void mycall(T* p) {
        printf("int: %d\n", *p->i);
        return;
}
*/
import "C"

func main() {
        p := &C.c_data
        i := C.int(42)
        p.i = &i
        C.mycall(p)
}

This will also break, once the GC is moving. No unsafe.Pointer involved.

You can perfectly break the rules also without using unsafe.Pointer, like in this example:

To be honest, I'm shocked. I would expect the compiler to actually reject the p.i = &i assignment(*). Or at least flag it as questionable during go vet... But let me get it straight. Is the suggestion to legitimize this kind of coding practice? If so, then I for one would argue that a more sustainable path forward would be rather to make this kind of assignment illegal and instead concentrate on facilitating passing Go pointers to Go pointers. Well, again, who am I to judge, just a thought :-) Thanks!

(*) more specifically without unsafe.Pointer conversion

To be honest, I'm shocked. I would expect the compiler to actually reject the p.i = &i assignment().
(
) more specifically without unsafe.Pointer conversion

Reject why? The types are matching. If you would cast it to unsafe.Pointer, the types would not match anymore, and the compiler would bail out. But if the types match, how can the compiler know, how the memory of c_data, which is a link to another package, is allocated by C and not by Go? The Go runtime can know though, that's why if you run this program with GODEBUG=cgocheck=2 then it panics with

write of Go pointer 0xc000186000 to non-Go memory 0x410e150
fatal error: Go pointer stored into non-Go memory

If so, then I for one would argue that a more sustainable path forward would be rather to make this kind of assignment illegal and instead concentrate on facilitating passing Go pointers to Go pointers.

This assignment is illegal at the moment. There is a difference what is illegal, and what a compiler or linter can catch. For example, if and how long an asynchronous C API is keeping a Go pointer for storing a future result in it, even the best compiler can not find out. And furthermore, we need a legal way for these kind of assignments for efficiently use certain C APIs. That's why there is no way around explicit pinning, if we want to keep the possibility to have a moving GC in the future.

Reject why? ... it panics with ["Go pointer stored into non-Go memory"]

That's why. I fail to imagine that compiler actually wouldn't be able to figure it out at compile (or at least vet) time. It apparently doesn't, and I'd say it's on compiler:-(

There is a difference what is illegal, and what a compiler or linter can catch.

I find this formulation really strange. Customarily compiler defines what's illegal. Well, specification does, but usually it's one of compiler's responsibility to "enforce the law", is it not?

we need a legal way for these kind of assignments for efficiently use certain C APIs

I for one would argue that yes, there definitely should be a legal way to use the said C APIs, but it doesn't actually have to be through these kind of assignments.

Note that it's not like I fail to see the value in explicit pinning. I just see great value even in extending the implicit pinning. By "extending" I mean that we already established that objects without Go pointers [passed by reference to cgo as argument] shall be pinned implicitly, and the question is if one can support other cases, presumably selected ones. I mean it's surely infeasible to just support a general case, but why not a slice of C.structs with pointers to Go objects without Go pointers? Thanks for listening! ๐Ÿ‘

rsc commented

Lots of discussion about what unsafe.Pointer means, but I don't see any objections to the API in #46787 (comment).

Does anyone object to that runtime.Pinner API?

One small question, based on the most recent edit I see (2021-10-20) is it to be called "Pinner" or "Pinned"? In the example it's now func (p *Pinned) Unpin() but still appears to be Pinner everywhere else?

@rsc I find the sequence var p runtime.Pinner; defer p.Unpin(); p.Pin(โ€ฆ) a bit strange. It feels like the Pinner outlives the Unpin call, it begs the question if it is legal to re-use the Pinner and/or what happens if you call p.Pin after p.Unpin. It also reads weird as the "undo" action appears lexically before the action to be undone and it might require a bit of effort to parse which pins exactly will be unpinned.

Just FWIW (I have no strong opinions as I don't predict I'll be ever using this API), it would also be possible to allow multiple objects to be pinned in one go via

func Pin(obj ...interface{}) *Pinned
func (*Pinned) Unpin()

to be used as

p := runtime.Pin(manyThings...)
defer p.Unpin()

(or even defer runtime.Pin(manyThings...).Unpin())

The only argument against that I can see is that it might be less efficient, if it requires allocating a slice - but maybe that can be solved with inlining/escape analysis?

Just in case. Is it plausible to expect that pinned pointers won't trigger "cgo argument has
Go pointer to Go pointer" and similar panics? (BTW, since it's on it checking, what would prevent it from pinning objects as it goes?;-)

it begs the question if it is legal to re-use the Pinner and/or what happens if you call p.Pin after p.Unpin

I think it should be legal to reuse a Pinner, mostly because I don't see a reason it shouldn't be. :)

func Pin(obj ...interface{}) *Pinned

Explicit pinning (and having an object that represents the pinned set) seems particularly useful in situations where the number of objects to pin isn't known statically. For example, when you need to loop over a slice of objects and collect up all of the pointers in them. Certainly that would be possible with the API you proposed, but even a very clever compiler would have a hard time eliminating the slice allocation from that.

I can confirm that the proposed API in #46787 (comment) covers perfectly all the use-cases that I dealt with so far, which are:

  • iovec syscalls, which require iterating over all the buffer pointers and pin them with the same Pinner object
  • asynchronous C APIs, which require to store the Pinner object accessible from a callback that can release it later.

In both cases the Pinner must be stored. So the nice looking and concise expression from @bcmills proposal defer Pin(objPtr).Unpin() couldn't be used in either of them.

And in the end this is a niche API, that will be used in rare cases, and it's not so dramatic if its use is a bit more quirky than usual.

@phlogistonjohn

One small question, based on the most recent edit I see (2021-10-20) is it to be called "Pinner" or "Pinned"? In the example it's now func (p *Pinned) Unpin() but still appears to be Pinner everywhere else?

No, it's the other way around, @rsc changed it from Pinned to Pinner, according to @aclements proposal, but he forgot it in the last line, which still says Pinned. So, it should be Pinner everywhere.

Just in case. Is it plausible to expect that pinned pointers won't trigger "cgo argument has Go pointer to Go pointer" and similar panics? (BTW, since it's on it checking, what would prevent it from pinning objects as it goes?;-)

@dot-asm You are right, with the default cgocheck=1 these checks are done anyway and could also pin the nested pointers, instead of panicking. But it would only cover the iovec use cases, not the async ones. And then you would need to do the same check again on these pinned Pointers and so on, which can sum up. Imagine you accidentally pin a binary tree. And although the docs state that these checks are "reasonably cheap", you can still opt out of them with cgocheck=0. And what then? Then they are not Pinned anymore, breaking the code basically? Code must be valid independently of the cgocheck value.

This raises an interesting question: how does this new API affect the pointer passing rules?

Instead of talking about Go pointers and C pointers, the rules could talk about fluid and static pointers, defining C pointers and pinned Go pointers as static pointers, and unpinned Go pointers as fluid pointers.

And will it be legal to pin Go pointers that point to memory that contain fluid pointers in the first place?

you can still opt out of them with cgocheck=0. And what then? Then they are not Pinned anymore, breaking the code basically? Code must be valid independently of the cgocheck value.

I would argue that if GC was moving, it would be reasonable to make implicit pinning independent on of cgocheck value. (Recall that we already established that there will be some implicit pinning going on, and it will have to be non-optional.) The concern was that since the checks are performed by default, it would take extra job to tell apart pinned and unpinned pointers. To a degree that auto-pinning might be more efficient;-)

BTW, can we agree on following? Since cgocheck variable is something that can be set only at application startup, it's on application user to set one. And it's not exactly appropriate to expect that user will make adequate choice. Application developer should have all the tools needed to make application work irregardless of user settings. It's actually more complicated than that. Module developer should have the tools to make the module work in somebody else's application context, and without having to impose anything as specific as cgocheck on the final application's developer.

But it would only cover the iovec use cases, not the async ones.

Yes, and my point is that if iovec-like cases are supported by implicit pinning, it would take you sufficiently far to make the effort worthwhile. Recall that I'm not suggesting that there should only be implicit pinning, just that having some would be [very?] useful.

Imagine you accidentally pin a binary tree.

It would be the developer's responsibility to figure it out and resolve the problem one way or another. And there should be legitimate means for solving it. Well, to a degree. I mean it's probably unrealistic to guarantee that there will be a solution for an arbitrary problem, and developers would have to deal with limitations, whatever they turn out to be. However, I would argue that it would be detrimental to leave illegitimate ways to circumvent the limitations open. Like the one I was shocked about earlier. (Yes, it was illegitimate according to the rules by @ianlancetaylor, even though types matched.) In other words the legitimate ways should be enforceable.

With both a read and a write barrier, there is no need for a mutex that blocks GC. A read barrier is certainly a performance cost. But it's not a mutex and it doesn't prevent parallel execution by the program and the GC.

I'm not sure I follow, but maybe we are talking past each other. Would you agree that when an object is being moved, the code referencing it can't actually execute? And once it resumes the execution all local copies of the relevant pointers would have to be externally adjusted by the GC? And for this last part to happen GC would have to either synchronize with target code one way or another, or be able to fix up registers in suspended thread's processor context?

@dot-asm As someone reading along, I really don't understand what you are trying to achieve, ultimately. You seem to agree that we need an explicit pinning API for use-cases like io_uring, correct? So, you are not actually arguing against this proposal?

You also seem to argue in favor of more "implicit pinning" and here, I'm less clear about your goals. Is it your opinion that we can still keep the option of a moving GC open for the future, even with relaxed rules about passing pointers to cgo? Because that may be so, but I don't see how this is relevant to this discussion. ISTM that "we should relax the rules" should be a separate proposal, then - having an explicit pinning API doesn't prevent that.

It also doesn't prevent us from implementing more intelligent detection mechanisms for implicitly pinning things when we get a moving GC and need to implement implicit pinning. But, again, that seems to be a discussion for a different time - specifically, the time we consider adding a moving GC. And nothing that needs to be litigated in this issue.

Yeah, I recognize that my delivery is somewhat chaotic. I apologize for that. As for the questions, "correct" to the first two, and as for the 3rd one, my opinion is more of a "should"/"ought" than "can." Because there is legitimate need for code that goes beyond passing pointer-free objects. As for relevance. It might be my flawed thinking, but I fail to view the two, pinning and passing pointers to cgo, as two disjoint issues. Indeed, we already established that pinning is ultimately about moving GC in the context of communication with non-Go code such as cgo. Now, in my mind it would be more appropriate to at least sketch new rules and then consider an interface that would facilitate the cases that would be impossible to cover by the new rules. In other words, I'm not convinced that the suggested interface would be in harmony with possible future. But hey, it's not your job to convince me:-) So let me say it again, feel free to disregard my remarks:-)

@dot-asm

In other words, I'm not convinced that the suggested interface would be in harmony with possible future.

I see little evidence of that. Given that we agree that the use-case of io_uring at least can't be solved without an explicit pinning API, we know requirements that such API would need to fulfill no matter what: The ability to explicitly declare the lifetime of a pin and the ability to add new pins dynamically over time. Those are exactly the requirements that led us to this API, though. So, even if we'd have other mechanisms available, we'd probably design the same API, given that those mechanism won't cover this use-case.

I also don't see any reason to believe that the introduction of this API would prevent us from adopting other mechanisms. After all, a future relaxation of the pointer-passing-rules would only ever decrease the need for this API. Any user who adopted this API to solve things covered by the relaxed rules could just remove it again.

So, no matter which way we consider the future, I don't see a reason to believe that the introduction of this API in any way influences or is influenced by changes to pointer-passing-rules.

But hey, it's not your job to convince me:-) So let me say it again, feel free to disregard my remarks:-)

The goal of the proposal process is to reach consensus. Continuing to post contrarian comments make it appear that consensus has not been reached. Which is why I'm trying to reconcile that. And why I'm arguing that, if you want to propose changes to the pointer-passing-rules, or want to propose mechanisms to better detect them, you should do so in a new issue, to indicate that we reached consensus here.

rsc commented

Based on the discussion above, this proposal seems like a likely accept.
โ€” rsc for the proposal review group

@dot-asm

With both a read and a write barrier, there is no need for a mutex that blocks GC. A read barrier is certainly a performance cost. But it's not a mutex and it doesn't prevent parallel execution by the program and the GC.

I'm not sure I follow, but maybe we are talking past each other. Would you agree that when an object is being moved, the code referencing it can't actually execute? And once it resumes the execution all local copies of the relevant pointers would have to be externally adjusted by the GC? And for this last part to happen GC would have to either synchronize with target code one way or another, or be able to fix up registers in suspended thread's processor context?

This should probably be discussed somewhere else, such as golang-nuts, not on this issue. It's not related to this proposal.

package runtime

type Pinner struct { ... }
func (p *Pinner) Pin(object interface{})
func (p *Pinner) Unpin()

@rsc, @aclements, while I was implementing this interface as a PoC, a further question came up: You are suggesting an interface {} as argument. I get that this is more comfortable to use, because it does't require any type punning. However, isn't that more expensive, because all the dynamic type structures need to be created for the call? I would naively have used unsafe.Pointer as argument, since I assume that in most cases, where one would use pinning, the unsafe package is imported anyway. Probably it doesn't make a real difference in the end, and it doesn't matter, but I thought we should at least touch that point shortly before wrapping up.

I'm not sure I agree that people using a Pinner will routinely import "unsafe". The additional type information is constructed statically by the compiler, so the extra cost of interface{} should be minimal. Of course we'll want to make sure that the interface value does not escape, but that seems feasible.

rsc commented

No change in consensus, so accepted. ๐ŸŽ‰
This issue now tracks the work of implementing the proposal.
โ€” rsc for the proposal review group

No change in consensus, so accepted. ๐ŸŽ‰
This issue now tracks the work of implementing the proposal.
โ€” rsc for the proposal review group

Awesome, thanks! ๐Ÿฅณ

Is it on me, who filed the issue, to implement it? I'm happy to do that, I just might need some guidance how to disable cgocheck for pinned pointers.

Or would it be better that someone from the runtime/GC team implements this?

It is not on you to implement this. If you want to implement it, though, that would be great. But this is not a trivial change. We'll need to efficiently track pinned pointers in some way.

It is not on you to implement this. If you want to implement it, though, that would be great. But this is not a trivial change. We'll need to efficiently track pinned pointers in some way.

Cool. As I said, I'm happy to try it, if I can get some direction what "some way" might be. Like some similar functionality in existing code I can look into, or general ideas and concepts. Maybe it would be good to line out the pinning process in pseudo code, so I don't forget an important step?

Or should I just push a stub implementation, and we iterate over it in the code review?

I don't know of any similar functionality. It's complicated. Pushing a stub implementation won't be helpful.

I think the key step here is that cgoCheckPointer and the functions that it calls must not complain about pinned pointers.

I also looked at the code yesterday for a bit. My current approach would be to change cgoIsGoPointer to cgoIsUnpinnedGoPointer, because it's exclusively called from cgocheck code. About how to mark a pointer as pinned I saw two options so far:

  • like cgo.Handle we could use a global sync.Map for registering all pinned pointers and keeping them alive as suggested before (or uintptr, if we don't want to keep them alive). I guess as long as the map is emtpy the performance impact for the default cgocheck=1 would be neglectible. However, I have no idea how expensive it is, if it's not empty, and if it would be acceptable.
  • similar to runtime.mspan.special we could add a list of pinned objects in the span. That might be anyway the appropriate place, when the GC needs to know about it later. I guess getting the span of a pointer and iterating through a list of few pinned objects is cheaper than a sync.Map lookup, but I'm not sure, especially if - as I assume - we need to serialze the list access.

Thoughts?

I don't think the sync.Map approach will work well, because 1) I think that if a runtime.Pinner is garbage collected, we should explicitly unpin the pointers; 2) I don't think the performance hit of changing cgocheck to look up pointers in a sync.Map will be acceptable.

I don't know about the mspan.special approach, that might work.

ad 1) why is a sync.Map registry and unpinning in a finalizer a contradiction? We could even let it panic, if a pinned Pinner is collected, as @bcmills suggested here, in order to educate people not to forget the unpinning. Note that the map would keep the pinned pointer alive, not the pinner itself. And actually that's not even necessary, because the references in the pinner itself would keep it alive, so the map could be a uintptr -> nil map.
ad 2) yeah, that's what I thought too. and I guess a normal map with a sync.RWMutex wouldn't perform any better? However, you also don't seem completely definite. Maybe someone here has a stronger feeling about that. Otherwise maybe some benchmarking could help giving us a better understanding?

I want to share some benchmarks, that I wrote in order go get a rough idea about the performance impact that we are talking about:

% /Users/svanders/sdk/go1.17.3/bin/go test -run=^$ -bench . pinnerbenchmark -benchtime 1s -c
goos: darwin
goarch: amd64
pkg: pinnerbenchmark
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkCCall0NoCgoCheck       19683577    55.13  ns/op
BenchmarkCCall0                 21328222    55.03  ns/op
BenchmarkCCall1NoCgoCheck       18208592    60.48  ns/op |
BenchmarkCCall1                 16470044    74.33  ns/op |
BenchmarkCCall4NoCgoCheck       16455018    63.66  ns/op |--> ~14.3ns / ptr
BenchmarkCCall4                  9597632   121.3   ns/op |
BenchmarkSyncMap                92286345    13.34  ns/op     +93% / ptr
BenchmarkMap                   220817404     5.263 ns/op     +36% / ptr
BenchmarkMutexMap               80391705    14.63  ns/op    +102% / ptr
BenchmarkRWMutexMap             81198786    14.75  ns/op    +103% / ptr
BenchmarkSpecials               51773179    23.25  ns/op    +163% / ptr
BenchmarkSpecialsWithoutLocks  275029039     4.281 ns/op     +30% / ptr

The "CCall" benchmarks are measuring the base costs of the call of a very simple C function and the costs of cgocheck=1 depending on number of pointer arguments. They show that cgocheck=1 adds about 14.3ns per pointer argument to the call costs on my computer.

The other benchmarks measure the costs of different lookups of non-existing keys/flags. The maps are not empty (10 entries, but that's irrelevant). BenchmarkSpecial basically calls removespecial from runtime/mheap.c for a non-existing special on an empty special list, in which case it's basically the same as a lookup. This function synchronizes with the GC and uses a lock, therefore it's quite slow. If we can avoid a lock for reading a list of pinned pointers, and only lock on write access and change the list with atomic operations, then it might be as cheap as BenchmarkSpecialsWithoutLocks, which basically is just the cost of getting the span for the pointer with spanOfHeap() and checking that the list pointer is nil.

The percentages are the increase of cost per pointer relative to the base cost per pointer. In general the values are all roughly in the same ballpark, but of course cgocheck=1 would get more expensive. However, with a real "payload" in the C call that cost might get negligible quite fast? With a simple snprintf() the C call already takes 140ns, making the 13ns from a sync.Map lookup not look very dramatic. So, what additional cost would be acceptable?

The linked list in the span structure would obviously be the cheapest (if we can avoid the locks), but it would increase the size of the runtime.mspan struct. Not sure if that is acceptable either.

Is it maybe also an option to move the checks to cgocheck=2, like the "store Go pointer in C memory" test?

If we think that the common case is that the checked pointers are not pinned, and there are no other special objects in its span, then the fast path could be as simple as finding the span and checking that the specials list is empty with an atomic.Loadp. Only if there are >0 specials in the span do we need to do more work (grab a lock, ...).
There's even a pageSpecials bitmap we could use.

@randall77

If we think that the common case is that the checked pointers are not pinned, and there are no other special objects in its span, then the fast path could be as simple as finding the span and checking that the specials list is empty with an atomic.Loadp. Only if there are >0 specials in the span do we need to do more work (grab a lock, ...). There's even a pageSpecials bitmap we could use.

Unfortunately, after thinking about it, I believe it's exactly the other way around: We would only check pinning status for nested Go pointers, and if they are not pinned, it panics as it does now already. So, in working code the pinning must always be set, when we check for it. But the good news is, only if there are nested Go pointers present we will have the extra cost. So you could say that's the extra cost, that comes with the possibility to pass nested Go pointers, and if you don't like it, set cgocheck=0. But we need to optimize for the pinned case. (At least for the cgocheck code.) Or did I forget something?

I guess you're right, if we don't need to check top-level pointers then the thing we're checking already being pinned is the common case.

The problem with reusing the specials list is, that the GC seems to modify it without lock while sweeping, so you have to synchronize with the GC and acquire the specials lock. If we would use a separate list for pinned pointers, I guess we could realize lock-free iteration.

Speaking of bitmaps: a pinnedBits *gcBits would certainly be the most performant, if we can afford the memory.

I wrote an implementation that uses a pinnedBits *gcBits pinnedBits *pBits bitmap in the mspan struct for marking pinned objects. The bitmap is only allocated, when the first object of a span is pinned. (I hope it's legal to allocate bitmaps after the span init, but at least it works.)

Costs if pinning is not used:

  • Memory cost is one pointer per span.
  • Runtime cost is zero, because the pinning is only checked in cases, where it would otherwise panic anyway.

Costs if pinning is used:

  • The first pin in a span allocates a bitmap with (nelems / 8) bytes.
  • Each nested pointer in a C call argument additionally costs about 3-4 ns on my computer, that's about a third of the cost of a top level pointer argument.
  • The only lock that is used is when the bitmap is allocated during the first pin in a span. I shared that lock with the specialslock, because this happens only once per span with at least on pinned object. So I guess it's fine to serialize specialslock and bitmap creation and therefore saving the memory for an additional lock per span.

While implementing, a couple of behavioural decisions came up, that I want to raise here:

  • Global objects and zero size objects: attempt to pin should panic or be irgnored? Are nested pointers to these allowed in C call arguments? I guess both are implicitly pinned anyway, correct?
  • Double pin: What should happen, when the object is already pinned? Panic or ignore? This might also happen when several pointers to the same object are pinned, like different fields of the same struct or different elements of the same array.
  • Pinner leaks: What should happen, when the GC collects an unreachable Pinner that still holds pinned pointers? Silently unpin them or panic? (At the moment my implementation panics.)

In general I would default to panics, because they give feedback to the author about possible issues. But with the double pin I'm not so sure if it might become annoying.

As always: feedback is highly appreciated. Thanks! ๐Ÿ˜Š

Change https://golang.org/cl/367296 mentions this issue: runtime: implement Pinner API for object pinning