Why are we reducing hash lookup to MMC_CONSISTENT_BUCKETS ?

Question

Why are we reducing hash lookup to MMC_CONSISTENT_BUCKETS ?

Opened this issue 3 years ago · 1 comments

    for (i = 0; i < us->servers->nelts; i++) {
        for (j = 0; j < server[i].naddrs; j++) {
            for (k = 0; k < (MMC_CONSISTENT_POINTS * server[i].weight); k++) {
                ngx_snprintf(hash_data, HASH_DATA_LENGTH, "%V-%ui%Z", &server[i].addrs[j].name, k);
                continuum->nodes[continuum->nnodes].sockaddr = server[i].addrs[j].sockaddr;
                continuum->nodes[continuum->nnodes].socklen = server[i].addrs[j].socklen;
                continuum->nodes[continuum->nnodes].name = server[i].addrs[j].name;
                continuum->nodes[continuum->nnodes].name.data[server[i].addrs[j].name.len] = 0;
                continuum->nodes[continuum->nnodes].point = ngx_crc32_long(hash_data, ngx_strlen(hash_data));
                continuum->nnodes++;
            }
        }
    }

    qsort(continuum->nodes, continuum->nnodes,
            sizeof(ngx_http_upstream_consistent_hash_node),
            (const void*) ngx_http_upstream_consistent_hash_compare_continuum_nodes);

    for (i = 0; i < MMC_CONSISTENT_BUCKETS; i++) {
        buckets->buckets[i] =
            ngx_http_upstream_consistent_hash_find(continuum, step * i);
    }

What is the advantage of calculating MMC_CONSISTENT_POINTS for each and then reducing all to MMC_CONSISTENT_BUCKETS ?.
I mean lets say there are 4 address we are doing 4 * MMC_CONSISTENT_POINTS and reducing to MMC_CONSISTENT_BUCKETS and doing a lookup only on the buckets { How are we sure the server is spready equally in this bucket ? }, what advantage are we getting with the same ?.

rgds
Nandhan

Answer 1 · 2022-02-23T08:51:54.000Z

AFAIAC，this make the time complexity of hash lookup reduce to O(1).

It's basically because the hash ring have a range of 0x0 to 0xffffffff, when you need to look up a hashed index, you have to use binary search.

As for the distribution，I don't think it would be a problem since even you do look up in the hash ring won't guarantee a completely equably routing to all upstream servers you 've got. As long as it make sure everyone could be proxied to one of the server stably and when one of backend servers is down, the request of this client can be transferred，it works fine as load balancing reverse proxy. (And the default buckets have 1024 slots might be enough to get a good distribution under most scene).

I don‘t know if this make sense to you.