Figure out if we can use POD data structures

Question

Figure out if we can use POD data structures

Closed this issue 5 years ago · 4 comments

Using Plain Old Data (POD) data structures makes the allocation / deallocation of them significantly faster.

As we are using a generator, it is significantly easier to confirm correctness around POD allocation / deallocation.

Answer 1 · 2019-07-13T00:41:55.000Z

What keeps us from having PODs are std::vectors, std::strings and std::unique_ptrs inside our structs. They are also the source of most of our performance loss.

I'm thinking about having arenas for every complex type which can occur more than once(unbounded or child of unbounded element). We can go over the input once more to count the types and allocate the arenas. It's possible that one more round of chasing PugiXML pointers beats millions of small allocations. Then we can have int num_t_X and t_X *X_list members instead of dynamic vectors.

This also applies for std::unique_ptrs. If the referred type is child of an unbounded element(think <meta> inside <metadata> inside <node>), we can again have a pointer to an arena slot. It also makes sense to simply place the child element's struct inside of the parent element's struct instead of a pointer. This creates trouble with recursive types like pb_type, but I think we can use a pointer there.

I can't think of an easy way to manage an arena of strings. It's easy to cheat by replacing them with const char *s and calls to strdup.

Answer 2 · 2019-07-13T01:20:36.000Z

We could potentially replace the arena of strings with the idstring stuff you looked at previously -> https://github.com/mithro/idstring

Answer 3 · 2019-07-20T01:49:01.000Z

The latest commit uses PODs: https://gist.github.com/duck2/fdf7e6ae3f33204e48a8a0a7e0ee08f0

Answer 4 · 2019-07-31T23:36:29.000Z

Well, it doesn't look like we are going to regress.