proposal: runtime: add AlignedN types that can be used to increase alignment
ianlancetaylor opened this issue ยท 65 comments
The sync/atomic packages have this in the docs in the "Bugs" section: "On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned." This makes it difficult to use atomic operations in types that may not necessarily be at the beginning of an allocated struct or slice. For example, sync.WaitGroup
does this:
// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
// 64-bit atomic operations require 64-bit alignment, but 32-bit
// compilers do not ensure it. So we allocate 12 bytes and then use
// the aligned 8 bytes in them as state.
state1 [12]byte
and this:
func (wg *WaitGroup) state() *uint64 {
if uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
return (*uint64)(unsafe.Pointer(&wg.state1))
} else {
return (*uint64)(unsafe.Pointer(&wg.state1[4]))
}
}
Further, on x86 there are vector instructions that require alignment to 16 bytes, and there are even some instructions (e.g., vmovaps
with VEC.256), that require 32 byte alignment. While those instructions are not currently generated by the gc compiler, one can easily imagine using them in assembler code, which will require the values to be appropriately aligned.
To permit programmers to force the desired alignment, I propose that we add new types to the runtime package: runtime.Aligned2
, runtime.Aligned4
, runtime.Aligned8
, runtime.Aligned16
, runtime.Aligned32
, runtime.Aligned64
, runtime.Aligned128
. (We could also use bit values, giving us runtime.Aligned16
through runtime.Aligned1024
, if that seems clearer.)
These types will be identical to the type struct{}
except that they will have a the alignment implied by the name. This will make it possible to write a struct as
type vector struct {
vals [16]byte
_ runtime.Aligned16
}
and ensure that instances of this struct will always be aligned to a 16 byte boundary.
It will be possible to change sync.Waitgroup
to be
type WaitGroup struct {
noCopy noCopy
_ runtime.Aligned8
state uint64
sema uint32
}
simplifying the code.
Although this functionality will not be used widely, it does provide a facility that we need today without requiring awkward workarounds. The drawback is the addition of a new concept to the runtime, though I think it is fairly clear to those people who need to use it.
Another complexity is that we will have to decide whether the size of a value is always a multiple of the alignment of the value. Currently that is the case. It would not be the case for the runtime.AlignedN
values. Should it be the case for any struct that contains a field with one of those types? If the size of a value is not always a multiple of the alignment, we will have to modify the memory allocator to support that concept. I don't think that will be particularly difficult, but I haven't really looked.
A few minor comments:
- If we do this, we should probably do #17586 too. It's not cmd/vet that relies on go/types for information about what cmd/compile will do alignment-wise. Another example is wasted.
- Your
struct vector
above will have a byte of padding at the end to avoid GC problems. You probably want thestruct{}
field as the first field. And we'll want to carefully document that recommendation. I mention this only because if you missed this detail, others definitely will. - This could also be done with a
//go:
annotation on the type. We already have other//go:
annotations on types. Don't panic. I'm not suggesting we do this. (Even the mere mention of annotations tends to raise banshee-level howling.) But it does raise the general question about when and why to use magic embedded types vs annotations vs perhaps some other general mechanism that doesn't exist yet.
It's true that that we do have go:nointerface
and go:notinheap
annotations on types, but neither is documented and the latter is clearly only for the runtime. I suppose my general reaction to support //go:
annotations on types can be summed up as https://www.youtube.com/watch?v=hulm_T_xnwY .
@ianlancetaylor when you wrote
(We could also use
runtime.Aligned16
, etc., if that seems clearer.)
I don't see how this differs from the previous sentence.
@cespare I meant to imply using bit values rather than byte values. Updated original comment to clarify. Those names would mean that the AlignedN
names would correspond to intN
names.
I agree we should fix this problem. I am less certain about how to fix it. Perhaps to start we should just align 64-bit integers to 64-bit addresses on 32-bit platforms. It's called out in sync/atomic because it's basically a bug on our side, one that we've just not fixed.
The solution proposed here essentially assumes the compiler will not reorder fields, at least not if these tags are present. I don't think we've fully closed the door on that (#10014). In that issue (two years ago), I argued that it is important for the programmer to have control over locality, so wholesale reordering of fields is not great (for example, sort by size and then lay out would give optimal packing but I think be too invasive).
At the same time, I am getting tired of looking for uint32-sized or bool-sized holes when adding fields to existing structs, and even more I am getting tired of being forced to choose between "understandable struct definition" and "small-in-memory struct definition". I do wonder if the compiler should be able by default to sift individual small fields up into gaps that would otherwise go unfilled, but not otherwise reorder the definitions. This is getting off-topic for this issue, except that any such scheme would need an override annotation for cgo and so that might give a mechanism for expressing alignment as well; of course any reordering would need to keep alignment in mind. I don't have any good ideas.
Also not every variable that needs alignment is a field in a struct. I'm not sure what to do about that either. Code might declare 'var x [16]byte' and want to pass it to something that requires 16-byte alignment, for example. Maybe that's getting too far ahead of ourselves, but it's worth keeping in mind.
I had hoped that alignment would be a property of a type, not a specific declaration. Are there cases where that's not tenable?
runtime.AlignedCache (which could just alias an appropriate AlignedN type) would help address #19025.
I did not mean to imply that this approach meant that the compiler could not reorder fields. The language spec does not state that the alignment of one field in a struct implies anything at all about the alignment of subsequent fields in the struct. I envision any uses of this as being of the form
type Vec struct {
_ runtime.Aligned16
b [16]byte
}
Here we know that any instance of Vec
is aligned to a 16-byte boundary.
I agree that alignment should be a property of a type, and I believe that that satisfies all alignment needs in Go. The question is how you specify that alignment, and whether it can be done without using a magic comment. This proposal is one approach: in effect, you can only specify the alignment of struct types, and you do so by adding a field of type runtime.AlignedN
.
For comparison, in C (with GCC extensions) you specify alignment of a type by writing
typedef T ... __attribute__((aligned(n)));
You can also specify alignment of a specific variable in the same way. Or, you can implement alignment for a specific memory allocation by using memalign
or posix_memalign
(since C's memory allocator does not understand types, it is generally necessary to use these functions when allocating memory for an aligned type).
Clearly for Go it would be nicer to be able to declare the alignment of any type, rather than this proposal which in effect only permits you to declare the alignment of a struct. If we can figure out a way to do that, we should. But I'd really rather not do it via a magic comment.
For sync/atomic, the biggest problem is usually not the alignment of the
whole structure (because we have at least 8 byte alignment on struct larger
than 8 byte), but ensuring that the next uint64 will be aligned correctly
within the struct.
Yes, understood. With this proposal you write "the next uint64" as a field of type alignedUint64, a struct defined as
type alignedUint64 struct {
_ runtime.Aligned8
v uint64
}
Vectors and sync/atomic are not the only uses of aligned types, so I think a more general solution would be a good idea if we can find one.
Here's a slight variation on the original proposal:
- Add
runtime.Align2
, ...runtime.Align128
. (Note: "align", not "aligned") - These are just like struct{}, except that if a
runtime.AlignN
, is used as a struct field, it specifies the alignment of the following field (in declared order). - If a
runtime.AlignN
is declared as the last struct field, it specifies the alignment of the entire struct.
This gets to @minux's point about aligning particular struct fields. You could even use multiple AlignNs to align multiple fields in a single struct.
(AlignN is more like a magic comment than AlignedN is, but at least it isn't a comment.)
@ianlancetaylor Thanks for the clarification. I was slightly confused by the fact that in your original post 'struct vector' (really should be 'type vector struct') puts the alignment after the field, not before.
If a runtime.AlignN is declared as the last struct field, it specifies the alignment of the entire struct.
Zero-sized final fields cause cmd/compile to insert a padding byte at the end. I imagine that is unacceptable in many of the cases for which increasing alignment is important.
As Minux pointed out (hard to tell with the Github-mangling of the email reply), AlignN declared anywhere would end up specifying a min alignment for the entire struct, since a struct can't have alignment less than any of its fields.
Thanks, Russ, I missed that.
Clever, Minux. Seems like it might be worth accepting it is a language change and defining unsafe.AlignedByte
and letting folks take it from there.
alignedbyte
is clever but I'm a little uncomfortable with the idea that [13]alignedbyte
has an alignment requirement of 13 bytes.
Perhaps it could specify a minimum alignment, and compilers could choose to round up to the nearest power of two.
Regarding Minux's idea:
type Align16 [0]aligned16 // zero size, can be embedded into other structs
to force alignment of the next field (and also the whole structure)
To clarify, isn't there "magic" required to guarantee this relationship with the next field (same as in my AlignN
version), if the compiler can reorder fields?
@josharian "minimum alignment" is not a well-defined concept unless the space of possible requests are all multiples or divisors of each others. A multiple of 16 is not a multiple of 13.
I tend to agree with Ian that alignedbyte is a little too much rope.
Ack.
The thing I am struggling with is that once you start introducing magic, it is unclear which the right magic is. A magic field type in package runtime? A comment-based type annotation? An interpreted field tag (or type tag)? A magic interface in package runtime (check whether a type has an Aligned2 method)?
This does feel a bit like a language change, though, and unsafe
does seem like the right home for manually messing with alignments. Maybe there's an alternative unsafe formation that provides less rope? Here's a terrible idea to start: unsafe.AlignedShift
: [n]unsafe.AlignedShift
has alignment 1<<n
. :)
I think the compiler definitely cannot reorder arbitrary structures.
What you think is only a little interesting; telling us why is much more interesting.
Perhaps we can embed a unsafe.PackedStruct to signify that the compiler can rearrange a struct, but I still think packing a struct is better done by another program, not automatic by the compiler.
That's fine for structs that people don't look at. What bothers me most about packing structs explicitly (by hand or with a program) is that doing so rewrites the source code to be less readable.
Allowing the compiler to reorder structures would be tricky: we may need to distinguish between unoptimized layouts (which the compiler should obviously fix) and hand-optimized layouts.
If the author has intentionally adjusted cache-line locality or packed the struct to match a kernel or C data structure, how do we tell the compiler not to break that?
I would guess that most uses of alignment fall into one of two categories:
- Passing a pointer to an unexported field to an
atomic
function. - Passing a struct to a C function (via cgo or a syscall).
We normally let the compiler figure out details of allocation and layout. Perhaps we could do the same for alignment most of the time.
We could do something akin to escape analysis to see which fields need to be aligned:
- If a pointer to a field is passed as a function parameter that requires a particular alignment, then both the field and the struct require at least that alignment. Pointers passed to functions in package
atomic
must be aligned to their element size. - If a pointer to a field or struct is passed to a cgo function call, syscall, or converted to an
unsafe.Pointer
which may be passed to a cgo function call or syscall, then the field and the struct require C-compatible alignment.
For the few remaining cases (are there any?), perhaps we could add a no-op function call (akin to runtime.KeepAlive
):
package runtime
// Align marks its argument as requiring the given alignment.
// The ptr argument must be a pointer to a variable of a struct type,
// or a pointer to a field on a variable of a struct type.
// The alignment argument must be a compile-time constant.
func Align(ptr interface{}, alignment int)
The only situation I'm aware of that would require explicit calls to Align
would be if a pointer is allocated and returned from a function in one package but the alignment constraints occur only in the calling package. I cannot think of any examples of such usage at the moment.
@bcmills a remaining case: argument to user-written assembly routine using vector instructions. (Minux mentioned this above too.)
@josharian Wouldn't vector assembly functions be amenable to the same kind of escape-analysis? But I suppose it's nontrivial to figure out which arguments propagate to a given assembly instruction.
We already pretty clearly have a bias toward //go:
comments to annotate constraints on assembly functions. You mentioned a //go:
comment on types earlier, but perhaps it belongs on the assembly function declarations instead?
//go:align64 ptr
func someAssemblyFunc(ptr unsafe.Pointer, offset int)
But to do that kind of analysis, we must first see the whole program.
Part of my point is that a bottom-up analysis (like we already do for heap escapes) would suffice in the vast majority of cases.
That is: I agree that it is possible, in principle, that a package B might import A and use fields from a struct defined in A in a way that requires a particular alignment. I disagree that that should affect the alignment of package A: either A should already be using those fields in a way that requires that same alignment, or the compiler should generate an error ("b.go:123: call to someFunction requires 16-byte alignment, but A.SomeStruct.X is only 8-byte aligned").
That's where runtime.Align
would come into play: if there is some such pair of packages B and A, A would need to call runtime.Align
on the relevant fields before allowing the value to escape from the package. But I am not aware of any examples of such packages B and A in practice. The cases I've seen that require alignment are generally all within the same package (e.g. a method calling an atomic
function on an unexported field).
Could you give some concrete examples of packages with this sort of inverted cross-package alignment constraint?
package A exports a set of structs for common metrics, but
doesn't provide update methods for them, and another package B uses
sync/atomic to update the metrics
Yeah, don't do that. Is this a hypothetical problem, or do you have a concrete example of this pattern?
Then memory layout of types in A depends on whether you import package B or
not, and I argue that's a bad thing.
Under the analysis I'm suggesting, the layout of types in A does not depend on whether you import B. If A does not provide the correct alignment, compilation of B would fail with an error.
@bcmills I think your suggestion requires us to be able to determine the alignment of a type used by a C function. I don't know how we can do that. Required type alignment is not exposed in DWARF.
Required type alignment is not exposed in DWARF.
Hmm, good point. Still, I think the key insight holds: if we annotate function parameters at the points at which values escape the Go runtime, then there doesn't necessarily need to be any annotation on the types themselves. Perhaps that would imply the need for //go:alignN
comments on functions which make cgo calls, but that isn't obviously worse than embedding tag-structs.
Plus, with the parameter-annotation approach we can match the actual alignment of the type to its usage: we can detect alignment errors at compile time. With the embedded tag-struct approach, it is not obvious to me that we can do any better than receiving a fatal signal at run-time.
In all honesty I think that being able to specify the alignment for a type is easier to understand, closer to what people expect, and less likely to have obscure errors.
The immediate concern is uint64 not being uint64-aligned on 32-bit systems. It probably should be. Assuming we do that, then maybe we can leave the bigger alignments until we understand the context in which it is needed.
Maybe we should put this proposal on hold?
I realize that I have a secondary unstated issue, which is for gccgo. gccgo uses the platform ABI for alignment, so I don't want to simply change the alignment of Go types. That means that I need some mechanism in gccgo to ensure that certain types are aligned as needed for atomic operations. But we can put this on hold for gc and I can invent something for gccgo.
If you'd like to experiment in gccgo, maybe start with //go:align N
applying to the next declaration (N = bytes), whether that's a type declaration or a struct field declaration? That avoids showing up at runtime (like a field tag) and also adding new API (like new runtime types).
On hold per discussion above.
CL https://golang.org/cl/41143 mentions this issue.
My concern with the language changes are that it makes it hard to have source code that compiles in older versions of go. The benefit of a comment (e.g. //go:align cache) is that it can be ignored by go versions which cannot interpret it. I currently have libraries that compile as far back as go 1.4 (github.com/ugorji/go/codec). I would like to leverage a better cache-line alignment model that the hacked (_ [N]byte // padding) i have all around the place.
@dvyukov points out in https://go-review.googlesource.com/c/go/+/138076/3/src/runtime/mheap.go#146 that we've in fact broken cache line padding in the runtime for various arrays because only align the size of the array element to a multiple of the cache line size, but have no guarantee that it will start on a cache line. As a result, neighboring elements can alias to the same cache line. Currently we can only solve this by adding a full cache line's worth of padding between each element, which is wasteful and can needlessly split elements across cache lines. Having a way to indicate that these arrays must be cache-line aligned would be a much better solution.
One motivation for taking another look at this issue could be the assembler's support of AVX-512 added in Go 1.11. AVX-512 code operating on 512 bit registers works best when the data on which it operates is 64 byte aligned, as each unaligned access is a cache-line split. As there doesn't currently seem to be a way to specify the alignment of slices and arrays, it's difficult to take full advantage of AVX-512 in Go without relying on a custom allocator or adding potentially unnecessary peeling code to the AVX-512 algorithm to handle leading unaligned data.
To test this out I created a simple AVX-512 function in Go assembler that adds one array of 32 bit integers to another. The function naively assumes that the input and output arrays are the same size and that this size is divisible by 16.
//func sum(a []int32, b []int32)
TEXT ยทsum(SB), NOSPLIT, $0
MOVQ a_base+0(FP), SI
MOVQ a_len+8(FP), DX
MOVQ b_base+24(FP), DI
loop1:
VMOVDQU32 -64(SI)(DX * 4), Z25
VMOVDQU32 -64(DI)(DX * 4), Z26
VPADDD Z25, Z26, Z25
VMOVDQU32 Z25, -64(SI)(DX * 4)
SUBQ $16, DX
JNE loop1
RET
I then created aligned and unaligned versions of a benchmark to test this code. The unaligned version looked something like this (I've edited out some initialisation code)
//go:noescape
func sum(in []int32, out []int32)
type container struct {
in [10 * 1024]int32
out [10 * 1024]int32
}
var c container
func BenchmarkAlign(b *testing.B) {
for n := 0; n < b.N; n++ {
sum(c.in[:], c.out[:])
}
}
In my first test, c.in and c.out happened to be 32 byte and not 64 byte aligned. I forced them to be 64 byte aligned by preceding them with a _ [32]int8, e.g.,
type container struct {
_ [32]int8
in [10 * 1024]int32
out [10 * 1024]int32
}
and re-ran the benchmark. Doing so improved the speed of the benchmark by 1.33x on my i9-7900X.
Of course, there's no guarantee that the addition of a 32 byte array to the start of a structure will always provide 64 byte alignment for the subsequent element. Therefore, it would be much nicer if I could write
type container struct {
_ runtime.Aligned64
in [10 * 1024]int32
out [10 * 1024]int32
}
Note I was also able to get the structure to be 64 byte aligned by simply deleting a fmt.Printf debug statement in the initialisation code. Doing so removed the fmt import, changed the alignment of c, and improved the speed of my benchmark by 33%.
This is probably an awful idea, but what about struct tags?
type fussy struct {
dontCare uint64
alignMe uint64 `align:32`
}
A naive analysis suggests that this struct would necessarily have a size of 64 bytes. If the struct's total size is not a multiple of 32 bytes, then an array of these would meet the alignment requirement only sometimes. If the offset of alignMe in the struct is not a multiple of 32 bytes, then it won't be aligned unless the struct itself is specifically not aligned.
... Now that I've written it down, I vote against it. I think it's more Go-like to require that the padding be explicit, using _
members.
Possible idiom:
type runtime.Aligned struct{}
type fussy struct {
dontCare uint64
_ [32]runtime.Aligned
alignMe uint64
}
(and if you do [17]runtime.Aligned, the compiler comes after you with a lead pipe.)
What about allowing tags to be applied to structs as well as fields? These could then also be surfaced via reflect which may be useful.
We would then have something like:
type WaitGroup struct `runtime:"nocopy,align64"` {
state uint64
sema uint32
}
Change https://golang.org/cl/237737 mentions this issue: syscall: add Get/Set methods to Stat_t.Size, Flock_t.{Start,Len}
Change https://golang.org/cl/308971 mentions this issue: cmd/compile: add internal/align package for runtime
One other thought that's probably horrible, but I'll leave it here anyway: how about making runtime.Aligned
work by using the size of the array that it's part of (Aligned
would be zero sized) ?
type WaitGroup struct {
_ [8]runtime.Aligned
state uint64
sema uint32
}
That allows any alignment to be specified without introducing a zillion new types.
Edit: I see that's pretty similar to this.
For the [17]runtime.Aligned
case, maybe it's reasonable to give the compiler the freedom to arbitrarily increase the alignment if it's not compatible with hardware constraints. So on a machine that didn't allow unaligned word access, it could round up to the nearest word-size multiple.
how about making runtime.Aligned work by using the size of the array that it's part of (Aligned would be zero sized) ?
This is how internal/align.elemT works in https://golang.org/cl/308971.
sounds like an interesting topic and someone gave a talk at GopherConf Eu 2023,
any update on this proposal?
I have a use case that I don't see mentioned here. I'm working on a concurrent hash map and it requires a guaranteed alignment of 64 for the buckets that it allocates internally. I'm storing these buckets in a huge array where a bucket pointer is stored together with a [0, 64) 6-bit integer. I need guaranteed alignment so that I can pack the 6-bit integer into the pointer value like so:
// unsafe.Pointer value is legal as long as it points _somewhere_ into the object. Because
// our buckets always have sizeof >= 64B (and alignment of 64) it means that we can pack the
// 6-bit value into the lowest 6 bits so that the resulting pointer will continue to point into
// the original allocated object.
packed := unsafe.Add(unsafe.Pointer(bucket), smallInt)
atomic.StorePointer(&arrayElement.ptr, packed)
// and unpacking
packed := atomic.LoadPointer(&arrayElement.ptr)
bucket := (*Bucket)(unsafe.Pointer(uintptr(packed) &^ 0b11_1111))
smallInt := uint64(uintptr(packed) & 0b11_1111)
For my benchmark cases the allocator just happens to give the buckets alignment of 64 but I really need the guarantee as this packing produces a very measurable performance boost on multiple metrics compared to storing both values in their individual fields. I would like to see this proposal advance.
I have a use case where I wish to use the last bit of the address to generic struct for pointer tagging. The structs are stored in an array and I will use uintptr to array elements to perform pointer tagging.
type SometimesAligned[T constraints.Unsigned] struct{
a,b,c T //constraints.Unsigned just limit to all unsigned int types.
}
Now SometimesAligned[byte]
is the only thing that's aligned to 1 byte and thus isn't safe to use pointer tagging. Using
type Aligned0[T constraints.Unsigned] struct{
a,b,c T
_ uint16
}
gives me 2 bytes alignment but will introduce unnecessary wasted space. It'll waste even more spaces on larger types.
type Aligned1 struct{
values [2]uint16
}
only wastes 1 byte of space and essentially has the same capability as SometimesAlgined[byte]
. However, it's hard to make this generic.
I wish to avoid the wasted spaces as much as possible while keeping the generic capability because these structs can be created in large numbers in arrays.
Does a _ [0]uint16
field work for you?
I'm interested in this for the purpose of mapping shared memory (named memory in Windows, memory mapped file in Linux). Being able to map structs to memory requires alignment.
This is a snippet of what I am using at moment:
type Header struct {
Status [4]byte // UTF-8 string
Version uint32
Revision uint32
// The unix time when the last update to the data occurred.
// Get int using GetLastUpdate.
LastUpdate [8]byte
// Offset of the Sensor section from beginning of Header.
Offset uint32
}
func (info Header) GetLastUpdate() int64 {
return int64(binary.LittleEndian.Uint64(info.LastUpdate[:]))
}
uint32
is aligned at the moment but this might change, uint64
is not, it has padding, so I'm using [8]byte
.
Does a
_ [0]uint16
field work for you?
Thank you. It looks like
type Aligned[T constraints.Unsigned] struct{
_ [0]uint16
a,b,c T
}
works ideally, but I kind of feel like it's a bit unintuitive.
@G-M-twostay If this proposal is accepted, you will be able to write _ runtime.Aligned2
instead.
@MatthiasKunnen Are you saying your LastUpdate
field contains a 64-bit integer, but is not 64-bit aligned? Are you trying to match a native Windows API that behaves like that? Or is there another reason you want non-64-bit aligned 64-bit integers?
This proposal won't help that situation (if I'm interpreting your situation correctly). It will only increase alignment, not decrease it.
@randall77 LastUpdate contains a 64-bit integer but using uint64 as the type takes more space than 64 bits. It was a while ago but I believe this was due to padding or something the compiler does for optimization.
This is a program that shares data by sharing a part of its memory that is packed according to a documented structure.
In essence, I'm trying to mirror C# code like this:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct SharedMemory {
public UInt32 version;
public UInt8 rev; // This must be byte in Go because, I believe, uint8 still takes 32 bits
public long poll_time; // This must be [8]byte in Go because uint64 takes more than 64 bits
};
Not to turn this issue to my personal blog but I have to report my ever increasing thirst for this feature. My concurrent hash map has a len atomic.Int64
field and as one would expect it's getting hammered by all CPU cores on the insert path causing false-sharing and whatever else on the other fields. Manually adding padding so that the field is on its own 128-byte chunk like the following has quite the impact on performance.
type Map[K comparable, V any] struct {
//...
pad [128 - 40]byte
len atomic.Int64
}
old: BenchmarkInsert/size=100000-8 40.77 ns/insert
new: BenchmarkInsert/size=100000-8 27.99 ns/insert
I have zero (0, nil) interest in manually maintaining the correct amount of padding. So I need runtime.Aligned64
to make the map work correctly and runtime.AlignedCacheLine
to keep it simple and fast.
@MatthiasKunnen: So it is the Pack=1
part of that C# declaration that is not aligning poll_time
to 64-bit boundaries?
In that case, I think the way you are doing it, with [8]byte
, is really your only option. This proposal (or any other that I know of) will not help in that situation.
@randall77, apologies, I was using the wrong terminology. C# does not do any alignment with pack=1
. Rather, it packs all fields as tightly as possible with no extra padding. Go does not do this in case of types such as uint64 and uint8. That being said, you are correct that this proposal will not help in this situation (I'm looking for padding decrease, not alignment increase) so I'll refrain from further comments to not derail the discussion. My previous comments can be minimized if desired.
I'm interested in this for the purpose of mapping shared memory (named memory in Windows, memory mapped file in Linux). Being able to map structs to memory requires alignment.
This is a snippet of what I am using at moment:
type Header struct { Status [4]byte // UTF-8 string Version uint32 Revision uint32 // The unix time when the last update to the data occurred. // Get int using GetLastUpdate. LastUpdate [8]byte // Offset of the Sensor section from beginning of Header. Offset uint32 }
<โฆ>
uint32
is aligned at the moment but this might change,uint64
is not, it has padding, so I'm using[8]byte
.
<โฆ>
@randall77 LastUpdate contains a 64-bit integer but using uint64 as the type takes more space than 64 bits.
It was a while ago but I believe this was due to padding or something the compiler does for optimization.
Here's the reason: Go makes sure each field in a struct
type is aligned using the field's type natural alignment.
If you look at your types, you'll see that Status [4]byte
is 32-bit, which makes the next 32-bit-sized field, Version
, be naturally aligned; since the next field, Revision
is also 32-bit, it's still naturally aligned.
All these three fields take up 4ร3=12 bytes, and if the next field, LastUpdate
, were to be a 64-bit (8 byte) integer, it would not be naturally aligned "as is", and the compiler would stuff 4 bytes before to make up for 16 bytes preceding space in total, which is wholly divisible by 8 making the field be naturally aligned.
Since a single byte is always naturally aligned, using [8]byte
instead of 64-bit integer does not make the compiler insert any padding before that field. This is why you see differences in type size in both cases.
This is a program that shares data by sharing a part of its memory that is packed according to a documented structure.
In essence, I'm trying to mirror C# code like this:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)] public struct SharedMemory { public UInt32 version; public UInt8 rev; // This must be byte in Go because, I believe, uint8 still takes 32 bits public long poll_time; // This must be [8]byte in Go because uint64 takes more than 64 bits };
While your desire is understandable, please note an important thing: some hardware architectures require memory loads and stores to be performed on addresses naturally aligned for the types ("sizes") of data they perform; basically, you cannot perform an (imaginary) machine instruction LOAD $ADDR -> %INT64_REG
unless $ADDR
is aligned on a 8-byte boundary; performing such an operation would make the CPU (!) generate an error at runtime.
I beleive this is the reason Go makes all fields in structs naturally aligned for their respective types.
x86, which is most probably the architecture you're using, does not have the above restriction: the instructions operating on unaligned memory pefrorm slower but do not fail (while I cannot present any proofs ATM, I also beleive x86 does even still preserve atomicity for such operations).
What I'm leading you to, is that if Go were to allow what you're after โ that is, to have LastUpdate
be a 64-bit integer not aligned naturally, โ an attempt to read or modify it (by machive-level "integer sized" instructions the compiler would likely have generated for that) would fail on some architectures Go supports.
In other words, if Go would have something like #pragma pack(1)
of some C compilers or that Pack = 1
of .NET's interop, your code would have needed to be guarded by build constraints making it only compilable on amd64
and 386
. (And this also hints at that such a feature, if implemented, would need to somehow require the usage of unsafe
as it can break the main guarantee provided by Go โ that what its compiler generates, cannot be incorrect from the PoV of the target hardware.)
As you can see, at the moment reading that 64-bit field using something like encoding/binary.LittleEndian.Uint64
definitely looks unwieldy but is guaranteed to work on any H/W arch.
If you're 100% sure this code will only ever work on x86, you could write a helper function which would still read that memory as an integer using type-punning made possible by unsafe
(basically take the address of the 1st byte of that [8]byte
array, reinterpret it as a pointer to a 64-bit integer and dereference the result).
@kostix, thank you for this dive into the reasoning behind it. Very interesting and informative!