Allow memory-overlapped fields for unmanaged unions
Opened this issue · 1 comments
Motivation
Along the same lines as Merge fields of the same type on multi-case union structs and More stack-efficient struct DUs it is desirable to represent struct unions with the minimum memory footprint i.e. sizeof<Tag> + sizeof<LargestCase>
rather than the existing representation of sizeof<Tag> + sizeof<Case0> + sizeof<Case1>...
The reasoning for the existing representation is that the .NET Runtime forbids overlapping reference and value types in memory:
Offset values shall be non-negative. It is possible to overlap fields in this way, though offsets occupied by an object reference shall not overlap with offsets occupied by a built-in value type or a part of another object reference. While one object reference can completely overlap another, this is unverifiable
(C) Ecma-335 II.10.7
This proposal differs from #699 and #1311 and avoids this limitation by requiring that all fields in the union be unmanaged
.
A type is an unmanaged type if it's any of the following types:
- sbyte, byte, short, ushort, int, uint, long, ulong, nint, nuint, char, float, double, decimal, or bool
- Any enum type
- Any pointer type
- Any user-defined struct type that contains fields of unmanaged types only.
Proposed Syntax
Unmanaged struct unions would use the same syntax as other discriminated unions however it would require addition of a new [<UnmanagedUnion>]
attribute.
[<Struct>]
[<UnmanagedUnion>]
type A =
| A0
| A1 of int
Any combination of fields and field names can work provided that all fields are unmanaged
and two fields in the same case do not have the same name.
Questions:
- Should both
[<Struct>]
and[<UnmanagedUnion>]
be required or does[<UnmanagedUnion>]
imply[<Struct>]
? - If
[<UnmanagedUnion>]
was not required, could the same output be generated, provided that all fields are known to beunmanaged
?
Examples of Limitations
[<Struct>]
[<UnmanagedUnion>]
type B =
| B0 of int * int // Allowed
| B1 of (int * int) // Not allowed, tuple is a reference type
| B2 of struct (int * int) // Allowed
| B3 of string // Not allowed, string is a reference type
| B4 of obj // Not allowed
| B5 of {| A: int |} // Not allowed
| B6 of struct {| A: int |} // Allowed
| B7 of struct {| A: obj |} // Not allowed, a struct contains a reference type
Compiled Representation
Considering the case:
[<Struct>]
[<UnmanagedUnion>]
type A =
| A0
| A1 of a:int
| A2 of b:bool * c:unativeint
It is anticipated that the generated code would be roughly equivalent to:
// Concept compiler-generated code
open System
open System.Runtime.CompilerServices
open System.Runtime.InteropServices
# nowarn "9"
type A_Gen_Tag =
| A0 = 0
| A1 = 1
| A2 = 2
// A0 has no fields so no need to create a struct
[<Struct>]
type A_Gen_A1 =
{
a: int
}
[<Struct>]
type A_Gen_A2 =
{
b:bool
c:unativeint
}
// Having a separate cases struct avoids the need to compute field offsets. They can all be 0.
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type A_Gen_Cases =
{
[<FieldOffset(0)>]
A1: A_Gen_A1
[<FieldOffset(0)>]
A2: A_Gen_A2
}
[<Struct>]
[<StructLayout(LayoutKind.Sequential)>] // Is this attribute required or useful?
type A_Gen =
{
Tag: A_Gen_Tag
Cases: A_Gen_Cases
}
static member A0 =
{ Tag = A_Gen_Tag.A0; Cases = Unchecked.defaultof<A_Gen_Cases> }
static member A1 (a: int) =
let mutable case = Unchecked.defaultof<A_Gen_Cases>
Unsafe.AsRef<A_Gen_A1>(&case.A1) <- { a = a }
{ Tag = A_Gen_Tag.A1; Cases = case }
static member A2 (b: bool, c: unativeint) =
let mutable case = Unchecked.defaultof<A_Gen_Cases>
Unsafe.AsRef<A_Gen_A2>(&case.A2) <- { b = b; c = c }
{ Tag = A_Gen_Tag.A2; Cases = case }
// The union patterns should be switched using the Tag field and present the fields equivalent to other unions
let (|A0|A1|A2|) (a: A_Gen)=
match a.Tag with
| A_Gen_Tag.A0 -> A0
| A_Gen_Tag.A1 -> A1 (a.Cases.A1.a)
| A_Gen_Tag.A2 -> A2 (a.Cases.A2.b, a.Cases.A2.c)
| _ -> failwith "Unreachable"
// Construction and usage would be roughly equivalent to:
let a0 = A_Gen.A0
let a1 = A_Gen.A1(1)
let a2 = A_Gen.A2(true, 2un)
match a0 with
| A0 -> printfn "A0"
| _ -> failwith "not A0"
match a1 with
| A1 a -> printfn "A1: %A" a
| _ -> failwith "not A1"
match a2 with
| A2 (b, c) -> printfn "A2: %A" (b, c)
| _ -> failwith "not A2"
Support For Generics
In principle there would be nothing wrong with supporting generics, provided they all had the unmanaged
constraint. However, this is not currently supported by the .NET runtime.
If the runtime restriction is lifted, it is expected that the user will still have to explicitly provide the unmanaged
constraint.
// Ok
[<Struct>]
[<UnmanagedUnion>]
type UnmanagedOption<'T when 'T: unmanaged> =
| UNone
| USome of v:'T
// Error: UnmanagedUnion requires that type parameter 'T have the unmanaged constraint. i.e <'T when 'T: unmanaged>
[<Struct>]
[<UnmanagedUnion>]
type UnmanagedOption<'T> =
| UNone
| USome of v:'T
Representation of Tag
The Tag
of a union has traditionally been represented by int32
however there are other logical choices:
byte
has the potential to give the smallest memory footprint, particularly for unions without any fields, but may degrade performance due to memory alignment issues. There would be very few, if any, discriminated unions in the wild with >256 cases.nativeint
is larger thanint32
on most machines today but has the potential to offer the best performance by using the machine's native integer size and memory alignment. It may result in the same struct size even on 64-bit machines depending on how .NET lays out the memory.- Others candidates
int16
,int64
or their unsigned equivalents don't have compelling advantages.
This could, optionally, be user-defined by providing the desired type in the UnmanagedUnion
attribute:
type UnmanagedUnionWithTagAttribute(t: Type) =
inherit Attribute()
// If not specified we provide a default
new () = UnmanagedUnionWithTagAttribute(typeof<int32>)
Support for Explicit layout and Packing
Normally the memory layout of structs with [<StructLayout(LayoutKind.Sequential)>]
can be further aligned by specifying the Pack
Field.
It is not anticipated that [<StructLayout>]
could be applied to a struct attributed with [<UnmanagedUnion>]
.
This kind of custom layout would require Allow the union pattern to be implemented explicitly to be implemented.
Pros and Cons
The advantages of making this adjustment to F# are better performance and the ability to interoperate with native libraries that use a tagged union without additional ceremony or allocations due to active patterns.
The disadvantages of making this adjustment to F# are the limitations of the proposal may mean its usage is rare and it is one more thing to do.
Extra information
Estimated cost (XS, S, M, L, XL, XXL): M
Related suggestions:
Affidavit (please submit!)
Please tick these items by placing a cross in the box:
- This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
- This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
- This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
- I have searched both open and closed suggestions on this site and believe this is not a duplicate
Please tick all that apply:
- This is not a breaking change to the F# language design
- I or my company would be willing to help implement and/or test this
For Readers
If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.
In our codebase we have only one DU type that we would want this for, but it has a pivotal place and is used for many calculations and needs to be high-performance.
However if this feature is not implemented in the F# compiler, the alternative is to convert the DU type into a manually defined struct using the exact code that you posted (thanks for that; very useful!). A disadvantage here is that matching would be worse, but matching on the tag and using AsCase()
methods doesn't seem so bad for occasional use, and active patterns would still work.
So the feature seems useful, but the workaround may be good enough.