[Performance] Implement StackArray of several sizes and use it for storing Attributes and Children
Closed this issue · 0 comments
twop commented
Context
Currently adding an attribute to a widget involves a heap allocation of a new array.
Copy pasting (at the moment of creation of this issue):
static member inline AddScalarAttribute(this: ^T :> IWidgetBuilder, attr: ScalarAttribute) =
let attribs = this.ScalarAttributes
let attribs2 = Array.zeroCreate(attribs.Length + 1) // <--- allocating a new one
Array.blit attribs 0 attribs2 0 attribs.Length
attribs2.[attribs.Length] <- attr
let result =
(^T: (new : ScalarAttribute [] * WidgetAttribute [] * WidgetCollectionAttribute [] -> ^T) (attribs2,
this.WidgetAttributes,
this.WidgetCollectionAttributes))
result
Problems with that approach
- We allocate a new array on the heap for each attribute. Note that this gives amazing ergonomics due to immutability that allows safe reuse of widgets, so it is there for a good reason!
- Most of the time (like 99.9%) these prev allocated arrays (
let attribs = this.ScalarAttributes
) are short lived and apply pressure on GC - Cache misses during attribute diffing. E.g. In order to traverse attributes we need to follow the pointer where these attributes are located.
Proposed solution
Use a technique commonly used in high performance computing when you know that arrays are going to be mostly small.
pseudocode:
[<Struct>]
type StackArrayN<'a> = {
Size: uint
Item0: 'a // initialized with 'default'
Item1: 'a
....
ItemN: 'a
PostN: 'a[] // needs to be initialized with null. e.g. 'default'
}
As you might guessed it is a bit tricky to work with and it needs to be explicitly written for each N. BUT it doesn't allocate anything before we reach N. Which makes traversal and creation very CPU cache friendly (in a happy path)
Luckily we have only a handful of use cases when it is applicable
Usecases
- Storing Attributes in a Widget, double win for creation and diffing perf
- Children of Widgets, usually we have at most 1 type of children attributes with some exceptions, thus StackArray1 might be a good candidate for those, although I'm not sure if we can actually do that given the recursive nature of types.
- Any other small collections
Notes
- Exact N needs to be determined from some existing code bases. Ideally we would run a project with a build that can capture usage profile. I would aim for p90 if it is not too big of a number. From what I saw it is probably somewhere between 5 and 10. @TimLariviere you are probably more knowledgeable in that regard.
Downsides
- Storing Widgets will consume more memory. Assuming we use N=5(TBD) for Attributes. For 1000 Widgets (which is a pretty large number) 5 * 32 bytes (upper limit of Attribute size) * 1000 = 150Kb. Totally fine by me
- It is super cumbersome to work with, how to iterate them with no dynamic dispatch nor allocations?
- Sorting is probably going to be a pain