Trying to understand HTreeMap

Question

Trying to understand HTreeMap

siddhsql opened this issue a year ago · 1 comments

the documentation of HTreeMap says:

Unlike other HashMaps it does not use fixed size Hash Table, and does not rehash all data when Hash Table grows

then later on the same page says:

HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure, which uses tree hierarchy of arrays. It is sparse, so unused entries do not occupy any space. It does not do rehashing (copy all entries to bigger array), but also it can not grow beyond its initial capacity (translation: size of HTreeMap is fixed).

Q.1: this is contradictory. which is correct?

same page then says:

Maximal Hash Table Size is calculated as: segment * node size ^ level count. The default maximal Hash Table Size is 8*16^4= 0.5 million entries

but when I looked at the code, node size seems to be 2^7 = 128 not 16.

int HTREEMAP_CONC_SHIFT = 3;
    int HTREEMAP_DIR_SHIFT = 7;
    int HTREEMAP_LEVELS = 4;

Q.2: what gives? is the doc wrong?

Q.3: further I can see the HTreeMap creates 8 indexTrees but each indexTree is a instance of org.eclipse.collections.api.map.primitive.MutableLongLongMap and is zero size to begin with and not constrained by any initial capacity. Is the documentation totally out of sync with reality?

Q.4: doc:

HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure, which uses tree hierarchy of arrays.

reality (I don't see any tree hierarchy of arrays. I just see a MutableLongLongMap for each segment (partition)):

val indexTrees: Array<MutableLongLongMap>,

the array is just used for segments e.g., the array size is 8 - 8 index trees for 8 segments.

Answer 1 · 2023-09-14T17:04:14.000Z

You are over-complicating it. Just use db.createHashMap.... and do not set any params you do not understand. Default values are quite ok.