qpfiffer/Simple-Sparsehash

Optimization: Avoiding bitmap access

Opened this issue · 1 comments

During execution of test_dict_lots_of_set the function _rehash_and_grow_table
resizes the dictionary from 1048576 to 2097152 (bucket_max) but bucketcount
has only the value 838861.

So instead of iterating in _rehash_and_grow_table with the following
for (i = 0; i < dict->bucket_max; i++) {
size_t bucket_siz = 0;
const struct sparse_bucket *bucket = sparse_array_get(dict->buckets, i, &bucket_siz);

it would be faster (20% less iteration) if there would be some kind of iterator

sparse_array_iter iter;
sparse_array_init_iter(&iter, dict->buckets);
while (sparse_array_get_next(&iter, &next_i, &next_bucket, &next_bucket_size)) {

where the next_Y arguments are all out values.

Also the current position in dict->buckets could be stored in iter so function
_sparse_array_group_get with its call to function position_to_offset could be avoided altogether.
Bitmap access is slow and avoiding it would improve performance in addition to 20% less iterations.

This same optimization could be applied to sparse_dict_free also (but not really necessary).

EDIT:
Does not gain that much speed (less than 5%)
I've implemented an iterator to test it see: https://gist.github.com/je-so/48e11093efd33ece86e7

By the way:
In function _sparse_array_group_get the 2 lines:

if (item_siz == NULL)
    return NULL;

should be written as

if (*(size_t*)item_siz == 0)
    return NULL;

to test for the size_t value stored at item_size.

EDIT2:
Storing the precomputed hash value in the bucket helps to reduce exec time by more than 15%.
During probing compare hash values first which avoids comparing strings!

EDIT3:
To speed things really up I guess that it is better to not manage whole structs in sparse array cause every insert possibly calls memcpy twice and one malloc and one free. IF you could come up with a more clever memory management where only pointers are moved in memory and bigger structs are stored in stable memory blocks things will improve. During insertion of 1000000 elements rehash is called 10 times and every time all stored structs are moved in memory up to 48 times (48 entries per group which gives us 48 reallocations per group at max!!)

Another speed-up would be to rewrite the memcpy() code in _sparse_array_group_set to use realloc() + memmove() instead. I think there are big wins to be had there.