proposal: runtime: add AlignedN types that can be used to increase alignment

Question

proposal: runtime: add AlignedN types that can be used to increase alignment

ianlancetaylor opened this issue 8 years ago · 65 comments

The sync/atomic packages have this in the docs in the "Bugs" section: "On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned." This makes it difficult to use atomic operations in types that may not necessarily be at the beginning of an allocated struct or slice. For example, sync.WaitGroup does this:

	// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
	// 64-bit atomic operations require 64-bit alignment, but 32-bit
	// compilers do not ensure it. So we allocate 12 bytes and then use
	// the aligned 8 bytes in them as state.
	state1 [12]byte

and this:

func (wg *WaitGroup) state() *uint64 {
	if uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
		return (*uint64)(unsafe.Pointer(&wg.state1))
	} else {
		return (*uint64)(unsafe.Pointer(&wg.state1[4]))
	}
}

Further, on x86 there are vector instructions that require alignment to 16 bytes, and there are even some instructions (e.g., vmovaps with VEC.256), that require 32 byte alignment. While those instructions are not currently generated by the gc compiler, one can easily imagine using them in assembler code, which will require the values to be appropriately aligned.

To permit programmers to force the desired alignment, I propose that we add new types to the runtime package: runtime.Aligned2, runtime.Aligned4, runtime.Aligned8, runtime.Aligned16, runtime.Aligned32, runtime.Aligned64, runtime.Aligned128. (We could also use bit values, giving us runtime.Aligned16 through runtime.Aligned1024, if that seems clearer.)

These types will be identical to the type struct{} except that they will have a the alignment implied by the name. This will make it possible to write a struct as

type vector struct {
    vals [16]byte
    _ runtime.Aligned16
}

and ensure that instances of this struct will always be aligned to a 16 byte boundary.

It will be possible to change sync.Waitgroup to be

type WaitGroup struct {
    noCopy noCopy
    _ runtime.Aligned8
    state uint64
    sema uint32
}

simplifying the code.

Although this functionality will not be used widely, it does provide a facility that we need today without requiring awkward workarounds. The drawback is the addition of a new concept to the runtime, though I think it is fairly clear to those people who need to use it.

Another complexity is that we will have to decide whether the size of a value is always a multiple of the alignment of the value. Currently that is the case. It would not be the case for the runtime.AlignedN values. Should it be the case for any struct that contains a field with one of those types? If the size of a value is not always a multiple of the alignment, we will have to modify the memory allocator to support that concept. I don't think that will be particularly difficult, but I haven't really looked.

Answer 1 · 2017-02-13T18:41:09.000Z

A few minor comments:

If we do this, we should probably do #17586 too. It's not cmd/vet that relies on go/types for information about what cmd/compile will do alignment-wise. Another example is wasted.
Your struct vector above will have a byte of padding at the end to avoid GC problems. You probably want the struct{} field as the first field. And we'll want to carefully document that recommendation. I mention this only because if you missed this detail, others definitely will.
This could also be done with a //go: annotation on the type. We already have other //go: annotations on types. Don't panic. I'm not suggesting we do this. (Even the mere mention of annotations tends to raise banshee-level howling.) But it does raise the general question about when and why to use magic embedded types vs annotations vs perhaps some other general mechanism that doesn't exist yet.

Answer 2 · 2017-02-13T19:18:26.000Z

It's true that that we do have go:nointerface and go:notinheap annotations on types, but neither is documented and the latter is clearly only for the runtime. I suppose my general reaction to support //go: annotations on types can be summed up as https://www.youtube.com/watch?v=hulm_T_xnwY .

Answer 3 · 2017-02-13T19:59:41.000Z

@ianlancetaylor when you wrote

(We could also use runtime.Aligned16, etc., if that seems clearer.)

I don't see how this differs from the previous sentence.

Answer 4 · 2017-02-13T20:03:28.000Z

@cespare I meant to imply using bit values rather than byte values. Updated original comment to clarify. Those names would mean that the AlignedN names would correspond to intN names.

Answer 5 · 2017-02-13T20:30:09.000Z

I agree we should fix this problem. I am less certain about how to fix it. Perhaps to start we should just align 64-bit integers to 64-bit addresses on 32-bit platforms. It's called out in sync/atomic because it's basically a bug on our side, one that we've just not fixed.

The solution proposed here essentially assumes the compiler will not reorder fields, at least not if these tags are present. I don't think we've fully closed the door on that (#10014). In that issue (two years ago), I argued that it is important for the programmer to have control over locality, so wholesale reordering of fields is not great (for example, sort by size and then lay out would give optimal packing but I think be too invasive).

At the same time, I am getting tired of looking for uint32-sized or bool-sized holes when adding fields to existing structs, and even more I am getting tired of being forced to choose between "understandable struct definition" and "small-in-memory struct definition". I do wonder if the compiler should be able by default to sift individual small fields up into gaps that would otherwise go unfilled, but not otherwise reorder the definitions. This is getting off-topic for this issue, except that any such scheme would need an override annotation for cgo and so that might give a mechanism for expressing alignment as well; of course any reordering would need to keep alignment in mind. I don't have any good ideas.

Also not every variable that needs alignment is a field in a struct. I'm not sure what to do about that either. Code might declare 'var x [16]byte' and want to pass it to something that requires 16-byte alignment, for example. Maybe that's getting too far ahead of ourselves, but it's worth keeping in mind.

I had hoped that alignment would be a property of a type, not a specific declaration. Are there cases where that's not tenable?

Answer 6 · 2017-02-13T20:55:42.000Z

runtime.AlignedCache (which could just alias an appropriate AlignedN type) would help address #19025.

Answer 7 · 2017-02-14T00:05:48.000Z

I did not mean to imply that this approach meant that the compiler could not reorder fields. The language spec does not state that the alignment of one field in a struct implies anything at all about the alignment of subsequent fields in the struct. I envision any uses of this as being of the form

type Vec struct {
    _ runtime.Aligned16
    b [16]byte
}

Here we know that any instance of Vec is aligned to a 16-byte boundary.

I agree that alignment should be a property of a type, and I believe that that satisfies all alignment needs in Go. The question is how you specify that alignment, and whether it can be done without using a magic comment. This proposal is one approach: in effect, you can only specify the alignment of struct types, and you do so by adding a field of type runtime.AlignedN.

For comparison, in C (with GCC extensions) you specify alignment of a type by writing

typedef T ... __attribute__((aligned(n)));

You can also specify alignment of a specific variable in the same way. Or, you can implement alignment for a specific memory allocation by using memalign or posix_memalign (since C's memory allocator does not understand types, it is generally necessary to use these functions when allocating memory for an aligned type).

Clearly for Go it would be nicer to be able to declare the alignment of any type, rather than this proposal which in effect only permits you to declare the alignment of a struct. If we can figure out a way to do that, we should. But I'd really rather not do it via a magic comment.

Answer 8 · 2017-02-14T00:16:06.000Z

For sync/atomic, the biggest problem is usually not the alignment of the whole structure (because we have at least 8 byte alignment on struct larger than 8 byte), but ensuring that the next uint64 will be aligned correctly within the struct. A nice solution is to just increase alignment of 64-bit types on 32-bit architectures. For vectors, I think once we figure out how to use SIMD intrinsics, we will have types of 128-bit, 256-bit and 512-bit alignment.

Answer 9 · 2017-02-14T00:44:10.000Z

For sync/atomic, the biggest problem is usually not the alignment of the
whole structure (because we have at least 8 byte alignment on struct larger
than 8 byte), but ensuring that the next uint64 will be aligned correctly
within the struct.

Yes, understood. With this proposal you write "the next uint64" as a field of type alignedUint64, a struct defined as

type alignedUint64 struct {
    _ runtime.Aligned8
    v uint64
}

Vectors and sync/atomic are not the only uses of aligned types, so I think a more general solution would be a good idea if we can find one.

Answer 10 · 2017-02-14T00:46:26.000Z

Here's a slight variation on the original proposal:

Add runtime.Align2, ... runtime.Align128. (Note: "align", not "aligned")
These are just like struct{}, except that if a runtime.AlignN, is used as a struct field, it specifies the alignment of the following field (in declared order).
If a runtime.AlignN is declared as the last struct field, it specifies the alignment of the entire struct.

This gets to @minux's point about aligning particular struct fields. You could even use multiple AlignNs to align multiple fields in a single struct.

(AlignN is more like a magic comment than AlignedN is, but at least it isn't a comment.)

Answer 11 · 2017-02-14T01:15:39.000Z

@ianlancetaylor Thanks for the clarification. I was slightly confused by the fact that in your original post 'struct vector' (really should be 'type vector struct') puts the alignment after the field, not before.

Answer 12 · 2017-02-14T01:45:02.000Z

On Feb 13, 2017 7:46 PM, "Caleb Spare" ***@***.***> wrote: Here's a slight variation on the original proposal: - Add runtime.Align2, ... runtime.Align128. (Note: "align", not "aligned") - These are just like struct{}, except that if a runtime.AlignN, is used as a struct field, it specifies the alignment of the *following* field (in declared order). - If a runtime.AlignN is declared as the last struct field, it specifies the alignment of the entire struct.

I think any AlignN will set the alignment for the whole structure. The logic is this: In order to ensure N byte alignment for a particular field, we must first ensure at least N byte alignment for the whole structure and then make sure the field has N byte aligned offset within the structure. I understand this proposal diverges from existing C convention, but I think it's more intuitive to the programmer. And we can also add a magical AlignCacheLine type to force at least cache line alignment in order to reduce false sharing (i think this is better than #19025). In order to not make the types too magical from the language specification's perspective, I suggest that we make them like this: type alignedbyte byte // the only magical and unexported type, only allowed as underlying type of an array, the resulting array has the alignment as the size of the array type aligned16 [16]alignedbyte // a 16-byte aligned type which takes 16-byte of space. type Align16 [0]aligned16 // zero size, can be embedded into other structs to force alignment of the next field (and also the whole structure) This takes advantage of recently clarified spec #18950, regarding alignment of [0]T.

Answer 13 · 2017-02-14T01:57:02.000Z

If a runtime.AlignN is declared as the last struct field, it specifies the alignment of the entire struct.

Zero-sized final fields cause cmd/compile to insert a padding byte at the end. I imagine that is unacceptable in many of the cases for which increasing alignment is important.

Answer 14 · 2017-02-14T02:22:02.000Z

As Minux pointed out (hard to tell with the Github-mangling of the email reply), AlignN declared anywhere would end up specifying a min alignment for the entire struct, since a struct can't have alignment less than any of its fields.

Answer 15 · 2017-02-14T02:56:07.000Z

Thanks, Russ, I missed that.

Clever, Minux. Seems like it might be worth accepting it is a language change and defining unsafe.AlignedByte and letting folks take it from there.

Answer 16 · 2017-02-14T04:31:28.000Z

alignedbyte is clever but I'm a little uncomfortable with the idea that [13]alignedbyte has an alignment requirement of 13 bytes.

Answer 17 · 2017-02-14T04:44:01.000Z

Perhaps it could specify a minimum alignment, and compilers could choose to round up to the nearest power of two.

Answer 18 · 2017-02-14T05:24:01.000Z

Regarding Minux's idea:

type Align16 [0]aligned16 // zero size, can be embedded into other structs
to force alignment of the next field (and also the whole structure)

To clarify, isn't there "magic" required to guarantee this relationship with the next field (same as in my AlignN version), if the compiler can reorder fields?

Answer 19 · 2017-02-14T05:28:20.000Z

The reason my original comment didn't export the alignedbyte type is because I only intend the runtime package to instantiate power-of-two sized array of it and then expose zero sized array of those types. I don't think we need to fully expose the magical alignedbyte type to the user (as a language extension.)

Answer 20 · 2017-02-14T05:31:49.000Z

@josharian "minimum alignment" is not a well-defined concept unless the space of possible requests are all multiples or divisors of each others. A multiple of 16 is not a multiple of 13.

I tend to agree with Ian that alignedbyte is a little too much rope.

Answer 21 · 2017-02-14T05:37:27.000Z

I think the compiler definitely cannot reorder arbitrary structures. Perhaps we can embed a unsafe.PackedStruct to signify that the compiler can rearrange a struct, but I still think packing a struct is better done by another program, not automatic by the compiler. One question for the runtime.AlignN idea: could it be used to reduce the alignment? I.e. can it be used to pack a struct? type packed struct { // size 5 (instead of 8) uint8 _ runtime.Align1 uint32 // at offset 1 (instead of 4 and leave 3 byte padding) } Packing a struct is an occasionally requested feature (and it will actually help cgo w.r.t. complex types), but supporting that on some architectures means non-atomic accesses to non-naturally-aligned fields.

Answer 22 · 2017-02-14T05:37:51.000Z

Ack.

The thing I am struggling with is that once you start introducing magic, it is unclear which the right magic is. A magic field type in package runtime? A comment-based type annotation? An interpreted field tag (or type tag)? A magic interface in package runtime (check whether a type has an Aligned2 method)?

This does feel a bit like a language change, though, and unsafe does seem like the right home for manually messing with alignments. Maybe there's an alternative unsafe formation that provides less rope? Here's a terrible idea to start: unsafe.AlignedShift: [n]unsafe.AlignedShift has alignment 1<<n. :)

Answer 23 · 2017-02-14T06:10:27.000Z

I think the compiler definitely cannot reorder arbitrary structures.

What you think is only a little interesting; telling us why is much more interesting.

Perhaps we can embed a unsafe.PackedStruct to signify that the compiler can rearrange a struct, but I still think packing a struct is better done by another program, not automatic by the compiler.

That's fine for structs that people don't look at. What bothers me most about packing structs explicitly (by hand or with a program) is that doing so rewrites the source code to be less readable.

Answer 24 · 2017-02-14T14:14:15.000Z

Allowing the compiler to reorder structures would be tricky: we may need to distinguish between unoptimized layouts (which the compiler should obviously fix) and hand-optimized layouts.

If the author has intentionally adjusted cache-line locality or packed the struct to match a kernel or C data structure, how do we tell the compiler not to break that?

Answer 25 · 2017-02-15T15:01:21.000Z

@bcmills, struct field reordering is #10014. When I mentioned it above I wrote:

This is getting off-topic for this issue, except that any such scheme would need an override annotation for cgo and so that might give a mechanism for expressing alignment as well.

Answer 26 · 2017-02-15T15:08:13.000Z

@rsc, it seems you used to oppose the idea of compiler reordering the fields. why you come to think that field reordering is beneficial now? #10014 (comment) And I still agree with your comment that packing the struct is solving the 1970 problem, and the 2010 problem can't be solved by the compiler, at least not alone.

Answer 27 · 2017-02-15T15:49:06.000Z

I would guess that most uses of alignment fall into one of two categories:

Passing a pointer to an unexported field to an atomic function.
Passing a struct to a C function (via cgo or a syscall).

We normally let the compiler figure out details of allocation and layout. Perhaps we could do the same for alignment most of the time.

We could do something akin to escape analysis to see which fields need to be aligned:

If a pointer to a field is passed as a function parameter that requires a particular alignment, then both the field and the struct require at least that alignment. Pointers passed to functions in package atomic must be aligned to their element size.
If a pointer to a field or struct is passed to a cgo function call, syscall, or converted to an unsafe.Pointer which may be passed to a cgo function call or syscall, then the field and the struct require C-compatible alignment.

For the few remaining cases (are there any?), perhaps we could add a no-op function call (akin to runtime.KeepAlive):

package runtime

// Align marks its argument as requiring the given alignment.
// The ptr argument must be a pointer to a variable of a struct type,
// or a pointer to a field on a variable of a struct type. 
// The alignment argument must be a compile-time constant.
func Align(ptr interface{}, alignment int)

The only situation I'm aware of that would require explicit calls to Align would be if a pointer is allocated and returned from a function in one package but the alignment constraints occur only in the calling package. I cannot think of any examples of such usage at the moment.

Answer 28 · 2017-02-15T15:54:18.000Z

@bcmills a remaining case: argument to user-written assembly routine using vector instructions. (Minux mentioned this above too.)

Answer 29 · 2017-02-15T16:03:52.000Z

Introducing a function to adjust alignment of fields look wrong to me as they're operating at fundamentally different layers. Too bad that we didn't reserve some struct tags to the language. An elegant solution could be: type T struct { data [16]uint8 `go:"align16byte"` } But essentially that will be a language change.

Answer 30 · 2017-02-15T16:06:24.000Z

@josharian Wouldn't vector assembly functions be amenable to the same kind of escape-analysis? But I suppose it's nontrivial to figure out which arguments propagate to a given assembly instruction.

We already pretty clearly have a bias toward //go: comments to annotate constraints on assembly functions. You mentioned a //go: comment on types earlier, but perhaps it belongs on the assembly function declarations instead?

//go:align64 ptr
func someAssemblyFunc(ptr unsafe.Pointer, offset int)

Answer 31 · 2017-02-15T16:19:22.000Z

But to do that kind of analysis, we must first see the whole program. And even that is not enough, because people can use interface to hide the struct from the linker. To put it another way, importing and using package B on types from package A can affect the size and alignment of types in package A (when A doesn't depend on B) is not acceptable (and in fact, not possible to implement in the general case.)

Answer 32 · 2017-02-15T16:28:52.000Z

But to do that kind of analysis, we must first see the whole program.

Part of my point is that a bottom-up analysis (like we already do for heap escapes) would suffice in the vast majority of cases.

That is: I agree that it is possible, in principle, that a package B might import A and use fields from a struct defined in A in a way that requires a particular alignment. I disagree that that should affect the alignment of package A: either A should already be using those fields in a way that requires that same alignment, or the compiler should generate an error ("b.go:123: call to someFunction requires 16-byte alignment, but A.SomeStruct.X is only 8-byte aligned").

That's where runtime.Align would come into play: if there is some such pair of packages B and A, A would need to call runtime.Align on the relevant fields before allowing the value to escape from the package. But I am not aware of any examples of such packages B and A in practice. The cases I've seen that require alignment are generally all within the same package (e.g. a method calling an atomic function on an unexported field).

Could you give some concrete examples of packages with this sort of inverted cross-package alignment constraint?

Answer 33 · 2017-02-15T16:41:49.000Z

For example, package A exports a set of structs for common metrics, but doesn't provide update methods for them, and another package B uses sync/atomic to update the metrics (embed metrics type defined in package A in type defined in B). Then memory layout of types in A depends on whether you import package B or not, and I argue that's a bad thing.

Answer 34 · 2017-02-15T16:44:38.000Z

package A exports a set of structs for common metrics, but
doesn't provide update methods for them, and another package B uses
sync/atomic to update the metrics

Yeah, don't do that. Is this a hypothetical problem, or do you have a concrete example of this pattern?

Then memory layout of types in A depends on whether you import package B or
not, and I argue that's a bad thing.

Under the analysis I'm suggesting, the layout of types in A does not depend on whether you import B. If A does not provide the correct alignment, compilation of B would fail with an error.

Answer 35 · 2017-02-15T18:33:12.000Z

@bcmills I think your suggestion requires us to be able to determine the alignment of a type used by a C function. I don't know how we can do that. Required type alignment is not exposed in DWARF.

Answer 36 · 2017-02-16T04:31:49.000Z

Required type alignment is not exposed in DWARF.

Hmm, good point. Still, I think the key insight holds: if we annotate function parameters at the points at which values escape the Go runtime, then there doesn't necessarily need to be any annotation on the types themselves. Perhaps that would imply the need for //go:alignN comments on functions which make cgo calls, but that isn't obviously worse than embedding tag-structs.

Plus, with the parameter-annotation approach we can match the actual alignment of the type to its usage: we can detect alignment errors at compile time. With the embedded tag-struct approach, it is not obvious to me that we can do any better than receiving a fatal signal at run-time.

Answer 37 · 2017-02-16T04:57:50.000Z

In all honesty I think that being able to specify the alignment for a type is easier to understand, closer to what people expect, and less likely to have obscure errors.

Answer 38 · 2017-02-16T17:30:23.000Z

The immediate concern is uint64 not being uint64-aligned on 32-bit systems. It probably should be. Assuming we do that, then maybe we can leave the bigger alignments until we understand the context in which it is needed.

Maybe we should put this proposal on hold?

Answer 39 · 2017-02-16T17:34:42.000Z

I realize that I have a secondary unstated issue, which is for gccgo. gccgo uses the platform ABI for alignment, so I don't want to simply change the alignment of Go types. That means that I need some mechanism in gccgo to ensure that certain types are aligned as needed for atomic operations. But we can put this on hold for gc and I can invent something for gccgo.

Answer 40 · 2017-02-16T18:03:24.000Z

If you'd like to experiment in gccgo, maybe start with //go:align N applying to the next declaration (N = bytes), whether that's a type declaration or a struct field declaration? That avoids showing up at runtime (like a field tag) and also adding new API (like new runtime types).

Answer 41 · 2017-04-10T15:53:24.000Z

On hold per discussion above.

Answer 42 · 2017-04-20T00:57:23.000Z

CL https://golang.org/cl/41143 mentions this issue.

Answer 43 · 2017-12-26T04:18:01.000Z

My concern with the language changes are that it makes it hard to have source code that compiles in older versions of go. The benefit of a comment (e.g. //go:align cache) is that it can be ignored by go versions which cannot interpret it. I currently have libraries that compile as far back as go 1.4 (github.com/ugorji/go/codec). I would like to leverage a better cache-line alignment model that the hacked (_ [N]byte // padding) i have all around the place.

Answer 44 · 2018-10-09T18:49:39.000Z

@dvyukov points out in https://go-review.googlesource.com/c/go/+/138076/3/src/runtime/mheap.go#146 that we've in fact broken cache line padding in the runtime for various arrays because only align the size of the array element to a multiple of the cache line size, but have no guarantee that it will start on a cache line. As a result, neighboring elements can alias to the same cache line. Currently we can only solve this by adding a full cache line's worth of padding between each element, which is wasteful and can needlessly split elements across cache lines. Having a way to indicate that these arrays must be cache-line aligned would be a much better solution.

Answer 45 · 2019-04-29T10:57:11.000Z

One motivation for taking another look at this issue could be the assembler's support of AVX-512 added in Go 1.11. AVX-512 code operating on 512 bit registers works best when the data on which it operates is 64 byte aligned, as each unaligned access is a cache-line split. As there doesn't currently seem to be a way to specify the alignment of slices and arrays, it's difficult to take full advantage of AVX-512 in Go without relying on a custom allocator or adding potentially unnecessary peeling code to the AVX-512 algorithm to handle leading unaligned data.

To test this out I created a simple AVX-512 function in Go assembler that adds one array of 32 bit integers to another. The function naively assumes that the input and output arrays are the same size and that this size is divisible by 16.

//func sum(a []int32, b []int32)	
TEXT ·sum(SB), NOSPLIT, $0
   	MOVQ a_base+0(FP), SI
	MOVQ a_len+8(FP), DX
	MOVQ b_base+24(FP), DI

loop1:
	VMOVDQU32 -64(SI)(DX * 4), Z25
	VMOVDQU32 -64(DI)(DX * 4), Z26
	VPADDD Z25, Z26, Z25
	VMOVDQU32 Z25, -64(SI)(DX * 4)
	SUBQ $16, DX	
	JNE loop1
	RET

I then created aligned and unaligned versions of a benchmark to test this code. The unaligned version looked something like this (I've edited out some initialisation code)

//go:noescape
func sum(in []int32, out []int32)

type container struct {
	in  [10 * 1024]int32
	out [10 * 1024]int32
}

var c container

func BenchmarkAlign(b *testing.B) {
	for n := 0; n < b.N; n++ {
		sum(c.in[:], c.out[:])
	}
}

In my first test, c.in and c.out happened to be 32 byte and not 64 byte aligned. I forced them to be 64 byte aligned by preceding them with a _ [32]int8, e.g.,

type container struct {
        _   [32]int8
	in  [10 * 1024]int32
	out [10 * 1024]int32
}

and re-ran the benchmark. Doing so improved the speed of the benchmark by 1.33x on my i9-7900X.

Of course, there's no guarantee that the addition of a 32 byte array to the start of a structure will always provide 64 byte alignment for the subsequent element. Therefore, it would be much nicer if I could write

type container struct {
        _ runtime.Aligned64
	in  [10 * 1024]int32
	out [10 * 1024]int32
}

Note I was also able to get the structure to be 64 byte aligned by simply deleting a fmt.Printf debug statement in the initialisation code. Doing so removed the fmt import, changed the alignment of c, and improved the speed of my benchmark by 33%.

Answer 46 · 2019-08-19T15:10:08.000Z

This is probably an awful idea, but what about struct tags?

type fussy struct {
    dontCare uint64
    alignMe uint64 `align:32`
}

A naive analysis suggests that this struct would necessarily have a size of 64 bytes. If the struct's total size is not a multiple of 32 bytes, then an array of these would meet the alignment requirement only sometimes. If the offset of alignMe in the struct is not a multiple of 32 bytes, then it won't be aligned unless the struct itself is specifically not aligned.

... Now that I've written it down, I vote against it. I think it's more Go-like to require that the padding be explicit, using _ members.

Possible idiom:

type runtime.Aligned struct{}
type fussy struct {
    dontCare uint64
    _ [32]runtime.Aligned
    alignMe uint64
}

(and if you do [17]runtime.Aligned, the compiler comes after you with a lead pipe.)

Answer 47 · 2019-08-19T16:29:47.000Z

What about allowing tags to be applied to structs as well as fields? These could then also be surfaced via reflect which may be useful.

We would then have something like:

type WaitGroup struct `runtime:"nocopy,align64"` {
    state uint64
    sema uint32
}

Answer 48 · 2020-06-12T20:57:47.000Z

Change https://golang.org/cl/237737 mentions this issue: syscall: add Get/Set methods to Stat_t.Size, Flock_t.{Start,Len}

Answer 49 · 2021-04-09T22:00:03.000Z

Change https://golang.org/cl/308971 mentions this issue: cmd/compile: add internal/align package for runtime

Answer 50 · 2021-04-14T11:53:06.000Z

One other thought that's probably horrible, but I'll leave it here anyway: how about making runtime.Aligned work by using the size of the array that it's part of (Aligned would be zero sized) ?

type WaitGroup struct {
    _ [8]runtime.Aligned
    state uint64
    sema uint32
}

That allows any alignment to be specified without introducing a zillion new types.

Edit: I see that's pretty similar to this.

For the [17]runtime.Aligned case, maybe it's reasonable to give the compiler the freedom to arbitrarily increase the alignment if it's not compatible with hardware constraints. So on a machine that didn't allow unaligned word access, it could round up to the nearest word-size multiple.

Answer 51 · 2021-04-16T22:09:31.000Z

how about making runtime.Aligned work by using the size of the array that it's part of (Aligned would be zero sized) ?

This is how internal/align.elemT works in https://golang.org/cl/308971.

Answer 52 · 2023-07-04T14:16:39.000Z

sounds like an interesting topic and someone gave a talk at GopherConf Eu 2023,
any update on this proposal?

Answer 53 · 2024-08-02T12:56:28.000Z

I have a use case that I don't see mentioned here. I'm working on a concurrent hash map and it requires a guaranteed alignment of 64 for the buckets that it allocates internally. I'm storing these buckets in a huge array where a bucket pointer is stored together with a [0, 64) 6-bit integer. I need guaranteed alignment so that I can pack the 6-bit integer into the pointer value like so:

// unsafe.Pointer value is legal as long as it points _somewhere_ into the object. Because
// our buckets always have sizeof >= 64B (and alignment of 64) it means that we can pack the
// 6-bit value into the lowest 6 bits so that the resulting pointer will continue to point into
// the original allocated object.
packed := unsafe.Add(unsafe.Pointer(bucket), smallInt)
atomic.StorePointer(&arrayElement.ptr, packed)

// and unpacking
packed := atomic.LoadPointer(&arrayElement.ptr)
bucket := (*Bucket)(unsafe.Pointer(uintptr(packed) &^ 0b11_1111))
smallInt := uint64(uintptr(packed) & 0b11_1111)

For my benchmark cases the allocator just happens to give the buckets alignment of 64 but I really need the guarantee as this packing produces a very measurable performance boost on multiple metrics compared to storing both values in their individual fields. I would like to see this proposal advance.

Answer 54 · 2024-08-05T06:18:40.000Z

I have a use case where I wish to use the last bit of the address to generic struct for pointer tagging. The structs are stored in an array and I will use uintptr to array elements to perform pointer tagging.

type SometimesAligned[T constraints.Unsigned] struct{
    a,b,c T //constraints.Unsigned just limit to all unsigned int types.
}

Now SometimesAligned[byte] is the only thing that's aligned to 1 byte and thus isn't safe to use pointer tagging. Using

type Aligned0[T constraints.Unsigned] struct{
    a,b,c T 
    _ uint16
}

gives me 2 bytes alignment but will introduce unnecessary wasted space. It'll waste even more spaces on larger types.

type Aligned1 struct{
    values [2]uint16
}

only wastes 1 byte of space and essentially has the same capability as SometimesAlgined[byte]. However, it's hard to make this generic.

I wish to avoid the wasted spaces as much as possible while keeping the generic capability because these structs can be created in large numbers in arrays.

Answer 55 · 2024-08-05T12:36:02.000Z

Does a _ [0]uint16 field work for you?

Answer 56 · 2024-08-05T22:53:22.000Z

I'm interested in this for the purpose of mapping shared memory (named memory in Windows, memory mapped file in Linux). Being able to map structs to memory requires alignment.

This is a snippet of what I am using at moment:

type Header struct {
	Status [4]byte // UTF-8 string

	Version uint32

	Revision uint32

	// The unix time when the last update to the data occurred.
	// Get int using GetLastUpdate.
	LastUpdate [8]byte

	// Offset of the Sensor section from beginning of Header.
	Offset uint32
}
func (info Header) GetLastUpdate() int64 {
	return int64(binary.LittleEndian.Uint64(info.LastUpdate[:]))
}

uint32 is aligned at the moment but this might change, uint64 is not, it has padding, so I'm using [8]byte.

Answer 57 · 2024-08-06T01:46:35.000Z

Does a _ [0]uint16 field work for you?

Thank you. It looks like

type Aligned[T constraints.Unsigned] struct{
    _ [0]uint16
    a,b,c T
}

works ideally, but I kind of feel like it's a bit unintuitive.

Answer 58 · 2024-08-06T03:16:15.000Z

@G-M-twostay If this proposal is accepted, you will be able to write _ runtime.Aligned2 instead.

Answer 59 · 2024-08-06T03:19:00.000Z

@MatthiasKunnen Are you saying your LastUpdate field contains a 64-bit integer, but is not 64-bit aligned? Are you trying to match a native Windows API that behaves like that? Or is there another reason you want non-64-bit aligned 64-bit integers?
This proposal won't help that situation (if I'm interpreting your situation correctly). It will only increase alignment, not decrease it.

Answer 60 · 2024-08-06T13:48:26.000Z

@randall77 LastUpdate contains a 64-bit integer but using uint64 as the type takes more space than 64 bits. It was a while ago but I believe this was due to padding or something the compiler does for optimization.

This is a program that shares data by sharing a part of its memory that is packed according to a documented structure.

In essence, I'm trying to mirror C# code like this:

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct SharedMemory {
    public UInt32 version;
    public UInt8 rev; // This must be byte in Go because, I believe, uint8 still takes 32 bits
    public long poll_time; // This must be [8]byte in Go because uint64 takes more than 64 bits
};

Answer 61 · 2024-08-06T14:34:31.000Z

Not to turn this issue to my personal blog but I have to report my ever increasing thirst for this feature. My concurrent hash map has a len atomic.Int64 field and as one would expect it's getting hammered by all CPU cores on the insert path causing false-sharing and whatever else on the other fields. Manually adding padding so that the field is on its own 128-byte chunk like the following has quite the impact on performance.

type Map[K comparable, V any] struct {
	//...
	pad [128 - 40]byte
	len atomic.Int64
}

old: BenchmarkInsert/size=100000-8 40.77 ns/insert
new: BenchmarkInsert/size=100000-8 27.99 ns/insert

I have zero (0, nil) interest in manually maintaining the correct amount of padding. So I need runtime.Aligned64 to make the map work correctly and runtime.AlignedCacheLine to keep it simple and fast.

Answer 62 · 2024-08-06T16:33:44.000Z

@MatthiasKunnen: So it is the Pack=1 part of that C# declaration that is not aligning poll_time to 64-bit boundaries?
In that case, I think the way you are doing it, with [8]byte, is really your only option. This proposal (or any other that I know of) will not help in that situation.

Answer 63 · 2024-08-06T17:42:44.000Z

@randall77, apologies, I was using the wrong terminology. C# does not do any alignment with pack=1. Rather, it packs all fields as tightly as possible with no extra padding. Go does not do this in case of types such as uint64 and uint8. That being said, you are correct that this proposal will not help in this situation (I'm looking for padding decrease, not alignment increase) so I'll refrain from further comments to not derail the discussion. My previous comments can be minimized if desired.

Answer 64 · 2024-08-07T15:55:43.000Z

I'm interested in this for the purpose of mapping shared memory (named memory in Windows, memory mapped file in Linux). Being able to map structs to memory requires alignment.

This is a snippet of what I am using at moment:
type Header struct {
	Status [4]byte // UTF-8 string

	Version uint32

	Revision uint32

	// The unix time when the last update to the data occurred.
	// Get int using GetLastUpdate.
	LastUpdate [8]byte

	// Offset of the Sensor section from beginning of Header.
	Offset uint32
}

<…>

uint32 is aligned at the moment but this might change, uint64 is not, it has padding, so I'm using [8]byte.
<…>
@randall77 LastUpdate contains a 64-bit integer but using uint64 as the type takes more space than 64 bits.
It was a while ago but I believe this was due to padding or something the compiler does for optimization.

Here's the reason: Go makes sure each field in a struct type is aligned using the field's type natural alignment.
If you look at your types, you'll see that Status [4]byte is 32-bit, which makes the next 32-bit-sized field, Version, be naturally aligned; since the next field, Revision is also 32-bit, it's still naturally aligned.
All these three fields take up 4×3=12 bytes, and if the next field, LastUpdate, were to be a 64-bit (8 byte) integer, it would not be naturally aligned "as is", and the compiler would stuff 4 bytes before to make up for 16 bytes preceding space in total, which is wholly divisible by 8 making the field be naturally aligned.

Since a single byte is always naturally aligned, using [8]byte instead of 64-bit integer does not make the compiler insert any padding before that field. This is why you see differences in type size in both cases.

This is a program that shares data by sharing a part of its memory that is packed according to a documented structure.

In essence, I'm trying to mirror C# code like this:

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct SharedMemory {
    public UInt32 version;
    public UInt8 rev; // This must be byte in Go because, I believe, uint8 still takes 32 bits
    public long poll_time; // This must be [8]byte in Go because uint64 takes more than 64 bits
};

While your desire is understandable, please note an important thing: some hardware architectures require memory loads and stores to be performed on addresses naturally aligned for the types ("sizes") of data they perform; basically, you cannot perform an (imaginary) machine instruction LOAD $ADDR -> %INT64_REG unless $ADDR is aligned on a 8-byte boundary; performing such an operation would make the CPU (!) generate an error at runtime.
I beleive this is the reason Go makes all fields in structs naturally aligned for their respective types.

x86, which is most probably the architecture you're using, does not have the above restriction: the instructions operating on unaligned memory pefrorm slower but do not fail (while I cannot present any proofs ATM, I also beleive x86 does even still preserve atomicity for such operations).

What I'm leading you to, is that if Go were to allow what you're after — that is, to have LastUpdate be a 64-bit integer not aligned naturally, — an attempt to read or modify it (by machive-level "integer sized" instructions the compiler would likely have generated for that) would fail on some architectures Go supports.

In other words, if Go would have something like #pragma pack(1) of some C compilers or that Pack = 1 of .NET's interop, your code would have needed to be guarded by build constraints making it only compilable on amd64 and 386. (And this also hints at that such a feature, if implemented, would need to somehow require the usage of unsafe as it can break the main guarantee provided by Go — that what its compiler generates, cannot be incorrect from the PoV of the target hardware.)

As you can see, at the moment reading that 64-bit field using something like encoding/binary.LittleEndian.Uint64 definitely looks unwieldy but is guaranteed to work on any H/W arch.

If you're 100% sure this code will only ever work on x86, you could write a helper function which would still read that memory as an integer using type-punning made possible by unsafe (basically take the address of the 1st byte of that [8]byte array, reinterpret it as a pointer to a 64-bit integer and dereference the result).

Answer 65 · 2024-08-07T16:19:11.000Z

@kostix, thank you for this dive into the reasoning behind it. Very interesting and informative!