Figure out if we can use POD data structures
Closed this issue · 4 comments
Using Plain Old Data (POD) data structures makes the allocation / deallocation of them significantly faster.
As we are using a generator, it is significantly easier to confirm correctness around POD allocation / deallocation.
What keeps us from having PODs are std::vector
s, std::string
s and std::unique_ptr
s inside our structs. They are also the source of most of our performance loss.
I'm thinking about having arenas for every complex type which can occur more than once(unbounded or child of unbounded element). We can go over the input once more to count the types and allocate the arenas. It's possible that one more round of chasing PugiXML pointers beats millions of small allocations. Then we can have int num_t_X
and t_X *X_list
members instead of dynamic vectors.
This also applies for std::unique_ptrs. If the referred type is child of an unbounded element(think <meta>
inside <metadata>
inside <node>
), we can again have a pointer to an arena slot. It also makes sense to simply place the child element's struct inside of the parent element's struct instead of a pointer. This creates trouble with recursive types like pb_type, but I think we can use a pointer there.
I can't think of an easy way to manage an arena of strings. It's easy to cheat by replacing them with const char *
s and calls to strdup
.
We could potentially replace the arena of strings with the idstring stuff you looked at previously -> https://github.com/mithro/idstring
The latest commit uses PODs: https://gist.github.com/duck2/fdf7e6ae3f33204e48a8a0a7e0ee08f0
Well, it doesn't look like we are going to regress.