klapaucius/vector-hashtables

Support hashtables with more than 2G elements

Closed this issue · 4 comments

Some users run Haskell on machines with over 1TB memory, it's not impossible to have over 2G elements of hashtable.

primes = UI.fromList [
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631,
761, 919, 1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143,
14591, 17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363,
156437, 187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897,
1162687, 1395263, 1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471,
7199369, 8639249, 10367101, 12440537, 14928671, 17914409, 21497293, 25796759, 30956117,
37147349, 44576837, 53492207, 64190669, 77028803, 92434613, 110921543, 133105859, 159727031,
191672443, 230006941, 276008387, 331210079, 397452101, 476942527, 572331049, 686797261,
824156741, 988988137, 1186785773, 1424142949, 1708971541, 2050765853 ]

What's the principle behind the choice of prime numbers here? How can it be extended further, to cover entire Int64 range?

the principle is:

[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...

so it can easily be extended given a sequence of prime numbers

Something like this

Prelude Math.NumberTheory.Primes Data.List> takeWhile (\x -> x < fromIntegral(maxBound :: Int)) . map unPrime $ iterate (\p -> nextPrime (ceiling $ 1.2 * fromIntegral (unPrime p))) (nextPrime 2050765853)
[2050765853,2460919037,2953102909,3543723499,4252468211,5102961869,6123554263,7348265123,8817918149,10581501797,12697802207,15237362659,18284835221,21941802293,26330162767,31596195347,37915434433,45498521329,54598225597,65517870827,78621445013,94345734019,113214880859,135857857043,163029428461,195635314181,234762377023,281714852449,338057822959,405669387577,486803265103,584163918127,700996701773,841196042177,1009435250623,1211322300767,1453586761009,1744304113229,2093164935877,2511797923093,3014157507727,3616989009287,4340386811153,5208464173387,6250157008087,7500188409719,9000226091663,10800271310027,12960325572053,15552390686477,18662868823783,22395442588543,26874531106253,32249437327559,38699324793071,46439189751707,55727027702123,66872433242599,80246919891143,96296303869429,115555564643329,138666677571997,166400013086401,199680015703691,239616018844471,287539222613371,345047067136049,414056480563273,496867776675943,596241332011217,715489598413487,858587518096247,1030305021715513,1236366026058641,1483639231270373,1780367077524493,2136440493029393,2563728591635279,3076474309962373,3691769171954869,4430123006345843,5316147607615027,6379377129138049,7655252554965709,9186303065958853,11023563679150639,13228276414980799,15873931697977019,19048718037572429,22858461645086941,27430153974104333,32916184768925261,39499421722710373,47399306067252449,56879167280702959,68255000736843607,81906000884212361,98287201061054833,117944641273265887,141533569527919211,169840283433503041,203808340120203719,244570008144244481,293484009773093401,352180811727712129,422616974073254567,507140368887905497,608568442665486697,730282131198584087,876338557438300843,1051606268925960971,1261927522711153157,1514313027253383839,1817175632704060721,2180610759244872743,2616732911093847187,3140079493312616483,3768095391975139873,4521714470370167831,5426057364444201011,6511268837333041219,7813522604799648773]

but probably fewer than that because each element is a bunch of words

the principle is:

[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...

so it can easily be extended given a sequence of prime numbers

Thanks for the explanation! So fundamentally there is enough freedom to select a prime around +20%.

I have an idea to choose primes such that division by them (which is ubiquitous throughout all routines) can be replaced by wide multiplication and right shift, but have not worked out all details yet.

Fixed by #27.